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Quantitative  structure-property  relationships  (QSPR)  approach  has  been  applied 
to  develop  models  to  predict  normal  boiling  points,  gas  solubilities  and  the  nonspecific 
solvent  polarity  scale  (S')-  These  models  provide  fundamental  insights  into  the  relevant 
inter-  and  intramolecular  interactions  within  pure  solvents,  and  between  solvents  and 
solutes. 

Normal  boiling  points  models  were  developed  for  a  structurally  wide  variety  of 
organic  compounds  using  the  CODESSA  (comprehensive  descriptors  for  structural  and 
statistical  analysis)  technique.  A  two-parameter  correlation  was  developed  {R^  =  0.9544,  s 
=  16.2  K)  which  involved  a  bulk  cohesiveness  descriptor,  Gi'^,  and  the  area-weighted 
surface  charge  of  the  hydrogen-bonding  donor  atom(s)  in  the  molecule.  A  more  refined 
QSPR  model  (with     =  0.9732  and  s  =  12.4  K)  includes,  in  addition,  the  most  negative 
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atomic  partial  charge  and  the  number  of  the  chlorine  atoms  in  the  molecule.  The  model  is 
theoretically  justified  and  provides  significant  additional  insight  into  the  relationship 
between  the  structure  and  the  boiling  points  of  the  compounds.  The  QSPR  equations 
developed  allow  remarkably  accurate  predictions  of  the  normal  boiling  points  for  diverse 
organic  compounds  and  a  number  of  simple  inorganic  compounds,  including  water. 

QSPR  models  developed  for  the  prediction  of  the  solubilities  of  organic  gases  and 
vapors  in  water  involved  a  limited  number  of  theoretical  molecular  descriptors.  These 
descriptors  are  mostly  derived  from  the  quantum-chemical  charge  distribution  of  the 
molecules,  and  have  definite  physical  meaning  corresponding  to  different  solute-solvent 
interactions  in  solutions.    A  two  parameter  correlation  with  the  squared  correlation 
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coefiBcient  R  =  0.977  gives  excellent  predictions  for  95  alkanes,  cycloalkanes,  alkenes, 

2 

alkylarenes  and  alkynes.  A  satisfactory  description  (R  =  0.941)  of  the  gas  solubilities  of 
406  organic  compounds  with  a  large  structural  variability  was  obtained  using  a  five 
parameter  QSPR  equation. 

QSPR  models  for  the  unified  nonspecific  solvent  polarity  scale  (S')  were 
developed  using  the  weighted  and  unweighted  correlation  analysis.  The  weighted 
correlation  equation  involves  two  orthogonal  theoretical  molecular  descriptors,  the  molar 
volume  weighted  total  dipole  moment  of  the  molecule,  and  the  HOMO,  LUMO  energy 
gap.  These  two  descriptors  provide  fundamental  physical  description  of  the  nonspecific 
solvation  at  molecular  level.  The  unweighted  correlation  equation  involves  three 
theoretical  descriptors.  Both  equations  allow  reliable  prediction  of  S'  for  diverse  solvents. 

vii 


CHAPTER  I 
GENERAL  INTRODUCTION 


The  application  of  Quantitative  Structure  Property/ Activity  Relationships 
(QSPR/QSAR)  is  a  rapidly  evolving  discipline  which  connects  the  information  encoded 
within  the  structure  of  a  molecule  to  the  properties  the  compound  exhibits.  The  utility  of 
this  approach  is  based  on  the  abundance  of  information  on  molecules,  derived  from 
quantum-chemical,  physical  or  information  theoretical  descriptions.  In  principal,  the 
quantitative  relationships  between  molecular  structure  and  properties  can  be  established 
for  any  organic  compound  amenable  to  molecular  orbital  calculation. 

There  is  no  doubt  that  the  structure  of  a  compound  determines  its  properties. 
However,  due  to  limitations  arising  primarily  from  the  theoretical  basis,  quantitative 
calculations  of  physical  and  chemical  properties  of  chemical  compounds  from  first 
quantum  mechanical  principles  are  not  currently  feasible.  Therefore,  QSPR/QSPR  has 
been  developed  as  an  alternative  approach  to  predict  the  properties  of  chemical 
compounds.  Various  physical,  chemical,  and  biological  properties  [86MIk,  92MIh, 
95JCICS841,  95JCICI039,  96JCICS1(K),  94AC1799,  95CSR279,  96JPC10400, 
96CR1027],  toxicities  [86MIk,  91JMC1668,  96CR1{)27],  and  chemical  reactivity 
[96CR1027]  have  been  predicted  using  the  QSPR/QSAR  approach. 
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The  structural  information  used  in  the  development  of  QSPR  can  be  retrieved 
solely  from  the  theoretically  derived  geometrical  and  electronic  structure  of  molecules, 
Therefore,  the  QSPR  can  provide  insight  into  the  essential  molecular  structural  features 
that  affect  the  properties  of  a  compound.  The  theoretical  QSPR  approach  has  been  widely 
used  for  the  prediction  of  a  variety  of  chemical  and  physical  properties  of  molecules, 
including  their  analytical  characteristics,  spectroscopic  data  and  chemical  reactivity 
[96CR1027].  The  same  technique,  known  alternatively  as  the  Quantitative 
Structure/ Activity  Relationship  (QSAR)  approach  has  been  used  to  assist  in  the  molecular 
design  of  new  pharmacological  agents  [86MIk,  92MIh,  96CR1027],  in  studies  of 
enzymatic  reaction  mechanisms  [86MIk,  96CR1027],  and  in  the  development  of  predictive 
models  for  environmental  pollution  prevention  and  monitoring. 


The  essence  of  the  QSPR/QSAR  is  the  representation  of  a  property  (P)  of  a  series 
of  compounds  through  an  expansion,  taken  over  molecular  structure  in  terms  of  "indices" 
or  "descriptors"  (xj)  (Equation  I. la). 


The  Basic  Nature  of  QSPR/QSAR 


j  =  1,2,3... 


(1.1a) 


(l.Ib) 
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In  the  multilinear  equation  (Lib),  a;  (i  =  1,  2,  3...)  denote  the  regression  coefficients  for 
the  structural  descriptors  (Xj)  and  describe  the  sensitivity  of  the  property  to  the  given 
descriptor,  whereas  a^  is  the  "standard"  value  of  the  property  P,  corresponding  to  the  zero 
value  of  all  descriptors  x,  involved  in  the  equation.  The  dependent  variables  (P)  are,  as  a 
rule,  the  experimental  values  of  the  corresponding  property.  In  most  cases,  these 
dependent  variables  are  limited  to  the  properties  of  pure  compounds.  The  QSPR 
description  of  the  properties  of  mixtures  which  are  dependent  on  the  collective 
interactions  between  the  individual  components  of  the  mixture  may  require  more  complex, 
nonlinear  forms  of  Equation  (1.1a),  for  instance,  with  the  involvement  of  cross-terms 
between  the  descriptors  for  individual  components.  Various  theoretically  calculated 
constitutional,  geometrical,  topological  and  quantum-chemical  descriptors,  the  principal- 
component  orthogonalized  space  of  theoretical  and/or  empirical  descriptors,  or  empirical 
QSPR  scales  that  are  derived  from  experimental  observations,  can  be  used  as  independent 
variables  (xj  ).  The  multilinear  regression  analysis  is  primarily  used  as  the  statistical 
technique  to  determine  the  coefficients  ao  and  aj  of  equation  (1.1b).  In  the  case  of  more 
general  nonlinear  equations,  the  nonlinear  least  squares  method  has  to  be  applied  to 
determine  the  characteristic  coefficients  of  such  equations. 
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Descriptors 

Virtually  unlimited  number  of  molecular  descriptors  can  be  calculated  from  the 
molecular  structure  alone.  These  include  non-empirical  or  theoretical  descriptors  which 
can  be  subdivided  into  several  classes  depending  on  the  structural  information  used  in  their 
development.  The  constitutional  descriptors  are  derived  simply  from  the  atomic 
constitution  and  atomic  connectivity  of  the  molecule.  The  examples  of  constitutional 
descriptors  are  the  numbers  of  atoms  and  bonds  in  the  molecule,  as  well  as  the  molecular 
weight  of  the  compound.  The  so-called  binary  variables  which  describe  the  presence  or 
absence  of  a  given  substituent  at  a  given  location  in  the  molecule  are  also  the 
constitutional  descriptors.  The  topological  descriptors  characterize  the  relative 
disposition  or  connectivity  of  atoms  in  the  molecule  and  quantify  the  molecular  branching. 
The  commonly  used  topological  descriptors  include  the  Wiener  index  [47  J  ACS  17],  Randic 
indices  [75JACS6609],  Kier  and  Hall  connectivity  indices  [86MIk],  and  information 
content  indices  [86MIk].  The  geometrical  descriptors  reflect  the  size  and  shape  of  the 
molecules  and  are  derived  from  the  3-D  coordinates  of  the  atomic  nuclei  in  the  molecule. 
The  latter  are  calculated  either  from  the  standard  bond  lengths  and  bond  angles  or  using 
the  quantum  mechanical  or  molecular  mechanical  optimized  geometry  of  the  molecule. 
Common  examples  of  the  geometrical  descriptors  include  the  molecular  moments  of 
inertia,  volume  of  the  molecules,  solvent  accessible  surface  area,  and  various  shadow 
indices  [87AC1()48,  87ACA99,  88AC978].  The  electronic  descriptors  describe  the 


5 

intramolecular  inductive,  resonance  and  hyperconjugative  effects.  These  descriptors  are 
mainly  derived  from  experimental  kinetic,  thermodynamic  and  spectroscopic  data.  The 
quantum  chemical  descriptors  are  derived  from  the  quantum-chemical  total  molecular 
wave  function  of  a  molecule  [94CT17,  93JCICS835,  94AC1799].  The  typical  examples 
of  such  descriptors  include  the  molecular  orbital  energies  and  superdelocalizabilities; 
atomic  partial  charges  [82JCC4{)7,  88JCC288,  92JCC492];  partial  negatively  and 
positively-charged  surface  areas  of  compounds  [9()AC2323];  electrical  moments  and 
polarizabilities  of  the  molecules,  spectroscopic  transition  energies;  and  bonding  and 
reactivity  indices  [92MIt,  92QSAR162,  96CR1()27].  Other  theoretical  descriptors  include 
electrostatic  and  steric  potential  maps,  different  theoretically  predicted  thermodynamic 
functions  and  solvation  characteristics  of  the  molecules  [96CR1()27]. 

In  addition  to  the  above  described  non-empirical  descriptors,  empirical  descriptors, 
derived  from  experimental  data  or  otherwise,  are  also  used  in  QSPR/QSAR  studies. 
Spectroscopically  derived  descriptors  and  group  contribution  units  are  two  examples.  The 
former  includes  Drago's  S',  Dimroth  and  Reichardt's  Et(30),  Kamlet,  Abboud,  and  Taft's 
n,  Kosower's  Z,  Brooker's  Xr,  Dong  and  Winnick's  Py  etc.  [88MIr,  94CR2319].  The 
latter  encompasses  a  wide  range  of  group  or  bond  units  which  are  predefmed  structural 
partitions  of  a  molecule  [94MIh]. 
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Statistical  Methods 

A  variety  of  statistical  methods  are  used  in  QSPR/QSAR  studies.  The  multiple 
(linear)  regression  analysis  (MRA)  has  been  the  most  widely  used  technique  in  that  regard. 
The  principal  component  analysis  (PCA)  [84MId,  86ACA1],  nonlinear  partial  least 
squares  (NIPALS)  [86ACA1],  and  target  transformation  (TT)  [84MId]  algorithms  have 
also  been  implemented  in  many  cases.  The  MRA  scheme  seeks  the  explanation  or 
prediction  of  the  dependent  variable,  property/activity  (P),  using  the  set  of  independent 
variable/descriptors  (Xj).  The  multi-linear  equation  is  written  as: 

/*  =  a„  +  ^  a,.jc,.  +  error  (1.2) 

J 

In  a  matrix  notation,  the  regression  coefficients  aj  can  be  calculated  as: 

A  =  (X'^X)Vp  (1.3) 

where  A  denotes  the  vector  of  regression  coefficients,  X  represents  the  matrix  of 
independent  variables  (descriptors),  P  is  the  vector  of  the  dependent  variables 
(property/activity),  and  superscript  T  on  a  matrix  indicates  the  transpose  of  the  matrix. 
Overfitting  and  chance  correlations  may  arise  when  an  excessive  number  of  independent 
variables  are  used  in  the  expansion  (eq  1.2).    In  such  a  case,  the  errors,  random  or 
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otherwise,  in  the  dependent  variables  will  be  fitted  by  the  equation.  In  addition,  the 
collinearity  of  descriptor  scales  used  in  Equation  (1.2)  often  results  in  a  major  loss  of 
information.  The  quality  of  the  correlation  can  be  described  by  a  number  of  statistical 
criteria  of  which  the  correlation  coefficient  R  (eq  1.4),  standard  deviation  s  (eq  1.5),  and 
F- value  (eq  1.6)  are  most  commonly  used: 


/f^=i-f-  :  -r  (1-4) 


(Pi.abs      Pi,calc  ) 


n-k-l 


R\n-k-l) 
kil-R') 


(1.5) 


(1.6) 


where  n  and  k  denote  the  number  of  property  values  and  descriptors,  respectively,  and  n  - 
k  -  I  is  the  number  of  statistical  degrees  of  freedom.  The  "robustness"  of  the  equation 
(1.1),  concerning  whether  the  correlation  depends  strongly  on  certain  highly  influential 
data  points  for  its  validity,  is  measured  by  the  cross-validated  correlation  coefficient  R^v^ 
or  by  the  external  prediction.  In  cross-validation  correlation  the  property  value  for  each 
compound  from  the  data  set  is  in  turn  predicted  from  the  regression  equation  calculated 
from  the  data  for  all  compounds  except  the  one  for  which  the  value  is  predicted.  In  the 
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external  prediction  method,  the  property  data  are  predicted  for  the  compounds  not  used  in 
the  statistical  treatment.  The  quality  of  such  prediction  characterizes  the  reliability  of  the 
model. 

In  PCA,  an  original  data  matrix  X  of  dimension  nxk  (n  =  number  of  observations, 
k  =  number  of  variables)  is  decomposed  into  two  matrices:  TnXc  and  PcXk  with  c  being  the 
number  of  principal  components,  X  =  TP  .  The  row  vectors  in  P  are  the  projections  of 
the  corresponding  principal  component  on  the  original  variables.  The  colunm  vectors  in 
T  are  the  projections  of  the  sample  points  on  the  corresponding  principal  component 
vectors.  The  principal  components  in  PCA  can  be  calculated  by  diagonalizing  of  the 
respective  data  matrix,  or  by  a  nonlinear  iterative  partial  least  squares  method  (NIPALS) 
[86ACA1].  Correlation  of  property/activity  using  principal  components  is  called 
principal  component  regression  (PCR)  [86ACA1,  86MIJ].  The  principal  components  are 
usually  difficult  to  define  in  terms  of  the  physical  interactions  determining  a  given 
property.  Therefore,  the  ambiguity  of  the  abstract  PCA  factor  scales  hinder  or  even 
preclude  the  physical  interpretation  of  the  QSPR/QSARs  developed.  Several  approaches 
have  been  proposed  for  a  better  representation  of  the  abstract  factor  space  (target  testing, 
error  distribution)  but  no  satisfactory  solution  to  this  problem  has  been  developed. 
Nevertheless,  PCA  provides  some  insight  into  the  underlying  structure  of  data  by 
simplifying  the  complexities  via  data  reduction  [84MId].  The  identification  of  the 
principal  components  of  the  properties  or  correlation  scales  can  guide  the  selection  and 
construction  of  the  descriptors  used  in  MRA. 
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The  CODESSA  Approach 


A  number  of  computer  programs  are  available  commercially  or  otherwise  for  the 
development  of  QSPR.  The  ADAPT  program  of  Jurs  [79MIs]  and  the  SPARC  software 
developed  at  U.S.  Environmental  Protection  Agency  [94TCC291]  have  been  visionary  and 
led  to  significant  advances  in  different  areas  of  QSPR/QSAR.  More  recently,  the 
CODESSA  (comprehensive  DEscriptors  for  Statistical  and  Structural  Analysis  [94MIk]) 
approach  was  developed  by  the  Katritzky  research  group  at  the  University  of  Florida  and 
the  Karelson  research  group  at  the  University  of  Tartu.  This  approach  has  already  been 
used  successfully  to  predict  a  large  number  of  molecular  properties  including  GC  retention 
indices,  GC  response  factors  [94AC1799],  melting  points,  boiling  points,  flash  points,  and 
octanol- water  partition  coefficients  [94CT17,  96RRCsub],  critical  micelle  concentrations 
[96L1462],  and  the  glass  transition  temperatures  of  polymers  [96JCICS879].  In  principle, 
any  chemical  or  physical  property  of  some  series  of  compounds,  or  their  biological 
activity,  can  be  analyzed  using  this  approach  and  related  to  the  chemical  structure  of  a 
compound.  A  large  number  (up  to  KKX))  of  molecular  descriptors  can  be  calculated  solely 
from  the  molecular  structure  using  CODESSA  (Appendix  I).  These  descriptors  are 
calculated  using  algorithms,  either  critically  selected  from  the  literature  or  developed  by 
the  authors  of  CODESSA.  Because  all  the  descriptors  are  derived  theoretically,  the 
CODESSA  QSPR/QSAR  technique  can  be  easily  applied  to  either  previously  unknown  or 
even  not  yet  synthesized  compounds. 


I 

1 
i 
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In  the  framework  of  CODESSA,  a  variety  of  important  statistical  methods  is 
available  for  the  development  of  QSPR/QSAR  and  for  performing  the  cluster  analysis  of 
the  properties  of  compounds.  The  two-dimensional  study  of  the  property  or  descriptor 
data  involves  the  analysis  of  the  intercorrelation  of  different  properties  or  different 
descriptors.  A  number  of  methods  based  on  the  (multi)linear  regression  technique  are 
available  in  CODESSA  to  search  for  the  best,  i.e.  for  the  most  predictive,  structure- 
property  correlations.  Two  strategies,  associated  with  MRA,  are  applied  for  the  selection 
of  the  best  correlation  over  a  large  descriptor  space  -  (i)  the  best  multi-Unear  regression 
analysis  [96RRCsub]  and  (ii)  the  heuristic  method  [96RRCsub].  Both  procedures  are 
described  in  Appendix  11. 

Using  CODESSA,  the  molecular  properties,  the  descriptors,  or  their  combinations 
can  also  be  analyzed  using  factor  analysis-based  pattern  recognition  methods.  Principal 
component  analysis  (PC A)  [84MId,  86ACA1],  the  nonlinear  iterative  partial  least  squares 
(NIPALS)  [86ACA1],  and  target  transformation  (TT)  [84Mld]  algorithms  have  been 
implemented  for  this  purpose.  The  output  of  the  program  includes  the  numerical  values  of 
the  descriptors  and/or  properties,  the  results  of  the  statistical  analysis,  and  the  graphical 
representation  of  various  statistical  treatments. 
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Objective 

The  QSPR  models  of  the  physical  properties  of  diverse  compounds  have 
commonly  involved  a  relatively  large  number  of  molecular  descriptors  [84JACS1205, 
90JCC493,  90AC2323,  91JCICS301,  92JCICS306,  93JCICS616,  94JCICS947, 
95JCICS68].  For  example,  a  statistically  satisfactory  description  of  various  liquid  state 
properties  of  compounds  with  diverse  molecular  structure  has  been  usually  achieved  by 
involving  at  least  up  to  six  to  ten  parameters  [84JACS12()5,  9()JCC493,  9()AC2323, 
91JCICS301,  92JCICS306,  94JCICS947].  On  the  other  hand,  a  smaUer  number  of 
intermolecular  interactions  are  expected  to  be  responsible  for  the  property  differences 
among  the  compounds  [8()JACS1837].  By  employing  the  PCA  technique,  Cramer 
[80JACS1837]  discovered  that  only  two  dimensions  (denoted  by  him  as  the  "bulk"  and  the 
"polar  cohesiveness")  should  be  the  major  intrinsic  dimensions  of  the  intermolecular 
interactions  in  the  liquid  state.  These  two  dimensions  were  derived  on  the  basis  of  six 
physical  properties  for  a  set  of  1 14  diverse  compounds,  and  they  accounted  for  about  96% 
of  the  variance  in  the  properties.  The  two  dimensions  (components)  were  later  interpreted 
as  dispersion  interaction  and  polar  cohesiveness  dominated  by  hydrogen-bonding  as  well 
as  dipole  interaction  contributions  [8()JACS1837].  However,  for  simple  physical 
properties  of  bulk  liquids  (pure  solvents)  such  as  boiling  points,  at  least  six  descriptors 
were  necessary  to  confme  the  predicted  errors  within  15  K  [91JCICS3()1,  92JCICS306, 
93JCICS616].    The  question  arises  as  to  whether  six  or  more  descriptors  are  really 
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necessary  for  boiling  point  models,  or  do  the  two  principal  components  actually  contain 
more  dimensions?  In  other  words,  what  are  the  main  molecular  features  that  are  actually 
responsible  for  the  inter-  and  intramolecular  interactions  among  pure  solvents?  In  order  to 
answer  this  question,  it  is  necessary  to  investigate  whether  there  are  overlapping  factors  in 
the  descriptor  space  used  in  the  previous  boiling  point  QSPR  models.  Therefore,  the  first 
objective  of  this  dissertation  is  to  screen  the  redundant  structural  representations  of  inter- 
and  intramolecular  interactions  in  pure  solvents  and  to  improve  the  formulation  of  the 
relationship  between  the  boiling  points  and  molecular  structure. 

In  the  case  of  the  properties  involving  solvents  and  solutes,  empirical  scales  have 
been  widely  used  for  the  predictions.  These  scales  are  mainly  calculated  from  the 
experimental  data  of  similar  type.  One  of  them,  termed  the  linear  solvation  energy 
relationship  (LSER),  was  developed  by  Kamlet,  Taft  and  Abraham  [77JACS6027].  The 
LSER  set  of  5  descriptors  has  been  demonstrated  to  be  a  very  powerful  tool  in  the 
characterization  of  physical,  chemical  and  toxicological  properties  [77JACS8325, 
83JOC2877,  88QSAR71,  88MIr,  94CR2319].  In  addition,  the  coefficients  of  the 
descriptors  in  the  correlation  equation  can  provide  insight  into  the  nature  of  the  solute- 
solvent  interactions. 

The  underlying  principle  of  the  LSER  Ls  that  any  property  that  relies  on 
solute/solvent  interactions  can  be  divided  into  contributions  from  four  effects,  as  shown  by 
the  following  equation: 

Property  =  Steric  +  Polarizability/Dipolarity  +  (1.7) 
Hydrogen  Bonding  Acidity  +Hydrogen  Bonding  Basicity 
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The  LSER  descriptors  (also  called  the  solvatochromic  descriptors)  were  derived 
from  gas  chromatography  and  UV-Vis  spectral  shifts  with  specific  indicators  [92JC229, 
93CSR73,  93JC95].  It  is  only  possible  to  consider  either  multiple  solutes  in  a  single 
solvent,  or  multiple  solvents  and  a  single  solute.  Furthermore,  the  LSER  descriptors  are 
limited  in  their  ability  to  make  a  priori  prediction  due  to  their  empirical  nature.  The 
determination  of  LSER  values  for  new  chemical  compounds  requires  several  experimental 
observations. 

Meanwhile,  theoretical  chemistry  has  been  gradually  utilized  to  provide  descriptors 
for  QSAR  [86Mlk,  91JMC1668,  95CC2()9,  96CR1027].  Based  on  the  LSER  philosophy 
and  general  structures,  Famini  and  coworkers  [91JMC1668,  92QSAR162,  93ACR599, 
93JCSPTII773,  95CC2()9]  have  developed  a  new  set  of  computationally  derived 
descriptors  called  the  theoretical  LSER  (TLSER).  The  TLSER  attempts  to  maintain  the 
same  relationship  between  property  and  parameters  by  incorporation  of  steric, 
polarizability  and  hydrogen  bonding  terms  calculated  from  the  same  theoretical  data. 

TLSER  uses  a  set  of  six  descriptors.  Each  parameter  describes  a  single, 
orthogonal  molecular  event  or  parameter.  The  advantage  of  using  this  single  set  of 
descriptors  has  been  demonstrated  several  times  by  Famini  and  Wilson  by  its  ability  to 
describe  different  properties  and  data  sets  [92QSAR162,  93JCSptii773,  93ACR599, 
95CC209,  94JCSPTII1647,  96JCSPTII83]. 

In  general,  theoretical  descriptors  are  very  powerful  in  the  following  ways:  First, 
the  descriptors  are  easily  computed  for  almost  any  chemical  species,  which  dramatically 
increases  the  correlative  abihty  of  the  method  by  increasing  the  number  of  compounds  that 
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can  be  used  in  the  data  set.  Secondly,  the  resulting  correlations  relate  an  empirical 
(macroscopic)  property  to  molecular  (microscopic)  parameters.  Thus,  how  molecular 
features  affect  the  observed  property  can  be  elucidated.  Also,  it  is  almost  always  possible 
to  proceed  from  the  nearly  orthogonal  set  of  theoretical  descriptors  which  minimizes  the 
problems  related  to  the  intercorrelation  between  independent  variables. 

The  TLSER  possesses  the  above  stated  advantages  but  does  not  contain  explicitly 
any  information  about  the  solute,  which  limits  its  ability  to  represent  the  full  details  about 
solute/solvent  interactions.  It  is  known  that  the  complexity  and  diversity  of  the  solvent- 
solute  interactions  are  very  difticult  to  quantify  without  losing  generality. 

In  fact,  a  unified  solvation  model,  proposed  by  Drago  [92JCSPTII1827, 
92JOC6547,  94JCSPni219,  94JCC145,  94JACS75()9,  95JPC6563,  96JCSPTiisub], 
involves  both  solvent  (SO  and  solute  (P)  contributions: 

AX  =  S'P  +  w  (1.8) 

where  Ax  and  w  are  the  property  and  intercept,  respectively.  Such  a  model  accounts  for 
the  differences  among  the  solutes  and  unifies  the  solvent  polarity  as  an  independent 
variable.  The  deviations  from  predicted  data  using  the  unified  solvation  model  allow  an 
analysis  of  the  mechanisms  that  underlie  the  additional  effects,  instead  of  requiring  a 
search  for  another  scale  [92JCSptii1827].  Nevertheless,  the  empirical  solvent  parameter 
involved  in  the  model  has  neither  a  definite  physical  meaning  nor  calculable  numerical 
representation.    Thus,  it  is  of  great  theoretical  and  practical  interest  to  explore  the 
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theoretical  representations  and  intrinsic  dimensions  of  the  empirical  solvent  parameter 
utilized  in  the  unified  solvation  model. 


Outline  of  the  Remaining  Chapters 


In  this  dissertation,  chapter  two  will  reveal  the  molecular  features  that  are 
responsible  for  simple  liquid  properties  including  the  normal  boiling  points  and  critical 
temperatures  of  pure  liquids.  A  better  representations  for  the  bulk  liquid  intermolecular 
interactions  will  be  discussed.  In  chapter  three,  a  QSPR  study  of  the  solubUity  of  gases 
and  vapors  in  water  and  hexadecane  will  be  presented.  The  comparison  will  be  made 
between  an  empirical  approach  and  the  CODESSA  approach.  The  solvation  mechanism 
of  polar  and  non-polar  gases  and  vapors  dissolving  in  water  and  hexadecane  will  be 
discussed.  In  chapter  four,  a  novel  exploration  of  the  physical  meaning  and  theoretical 
representation  of  the  unified  non-specific  solvent  polarity  scale  will  be  presented.  In 
chapter  five,  the  significance  of  the  present  study  and  future  work  will  be  discussed. 


CHAPTER  II 

QSPR  MODELS  FOR  THE  NORMAL  BOILING  POINT  OF 
DIVERSE  ORGANIC  COMPOUNDS 


Introduction 


Among  various  physical  properties  of  a  compound,  the  normal  boiling  point  is  one 
of  the  most  important  for  the  identification  of  an  unknown  substance.  This  property 
serves  as  a  measure  of  volatility,  which  in  conjunction  with  other  properties,  determines 
the  hazards  of  a  chemical.  Availability  of  boiling  point  data  is  also  essential  in  the  usage  of 
separation  by  distillation  and  gas  chromatography.  Furthermore,  knowledge  of  the  boiling 
point  is  also  used  in  the  estimation  of  other  physical  properties,  such  as  the  critical 
temperature  [89CE157],  flash  point  [91MI97],  molar  volume  [91JAC2273],  and  enthalpy 
of  vaporization  [82MII,  95JACS5()13].  Although  experimental  techniques  for  measuring 
boiling  points  are  well  developed,  data  for  compounds  which  are  unavailable,  unknown,  or 
difficult  to  handle  must  be  obtained  by  an  estimation  procedure. 

The  understanding  of  the  boiling  point  is  best  formulated  by  Trouton's  rule,  which 
involves  the  ratio  of  enthalpy  and  entropy  at  the  phase  transition  between  Uquid  and 
vapour.  In  principle,  the  boiling  point  is  determined  by  the  intermolecular  interactions  in 
the  liquid,  and  by  the  difference  in  the  molecular  internal  partition  function  in  the  gas  and 
liquid  phases  at  boiling  temperature.  A  great  deal  of  effort  has  been  made  to  quantify  the 

16 


17 

relationship  between  the  normal  boiling  point  of  a  compound  and  its  molecular  structure. 
Walker  was  the  first  to  show  the  correlation  between  the  normal  boiling  point  (in  absolute 
temperature  T)  and  the  molecular  mass  (M),  T  =  aM*",  where  a  and  b  are  constants  for 
certain  homologous  series  [(18)94CS193].  Later,  other  correlations  were  proposed 
employing  physical  parameters,  such  as  the  parachor  and  the  molar  refractivity  which  are 
influenced  by  the  molecular  structure  [49CEP149].  Previous  methods  to  estimate  boiling 
points  have  been  summarized  by  Rechsteiner  [82MIr]  and  by  Horvath  [92MIh]. 

The  group  contribution  additivity  (GCA)  is  known  as  a  straightforward  and  simple 
method  extensively  used  for  predicting  boiling  points.  The  GCA  presumes  that  the 
cohesion  forces  in  the  liquid  have  predominantly  short-range  character,  and  accordingly, 
the  molecule  is  divided  into  predefined  structural  groups,  each  group  adds  a  constant 
increment  to  the  value  of  a  property  for  a  compound  [88MI351].  Kinney  proposed  the 
use  of  aliphatic  boiling  point  numbers  [38JACS3()32],  which  are  group  increments  of 
boiling  points  for  aliphatic  compounds.  In  his  model,  the  predicted  data  were  the 
functions  of  the  boiling  point  in  the  one  third  power.  One  of  the  largest  collections  of 
group  increments  for  the  prediction  of  normal  boiling  points  was  assembled  by  Joback  and 
Reid  [87CEC233]  and  further  extended  by  Stein  and  Brown  [94JCICS581].  The  final  85 
group  increments  were  derived  from  a  set  of  4426  diverse  organic  compounds  and 
provided  an  average  absolute  error  of  15.5  K  for  predicted  boiling  points. 

The  group  contribution  methods  for  estimating  boiling  points  are  limited  to  the 
types  of  compounds  for  which  all  group  contributions  have  been  established.  The  QSPR 
approach  is  found  to  be  more  useful  because  it  can  employ  descriptors  derived  solely  from 
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the  structure  of  a  molecule.  Pioneering  work  in  applying  QSPR  to  boiling  points  was 
done  by  Wiener  who  introduced  the  path  number  w  (named  later  the  Wiener  index) 
defined  as  the  sum  of  the  distances  between  any  two  carbon  atoms  in  the  molecule 
[47JACS17].  Using  w  along  with  a  polarity  number,  he  predicted  the  boiling  points  of 
paraffins  with  an  average  error  of  1"C.  Other  topological  indices,  including  the  Randic 
[75JACS6609]  and  Kier  &  Hall  molecular  connectivity  indices,  [76MIh]  have  been 
successful  in  correlating  the  boiling  points  of  alkanes  and  amines  [76MIh].  Seybold  et  al. 
[88JACS4186]  obtained  an  excellent  correlation  (R^  =  0.999)  of  the  boiling  points  of  74 
normal  and  branched  alkanes  using  a  model  which  combined  five  different  connectivity 
indices.  In  a  set  of  halogenated  hydrocarbons,  Seybold  et  al.  [86APH253]  successfully 
employed  a  Qh  descriptor  in  the  regression  model.  Qn  Ls  an  approximation  of  charges  on 
polar  hydrogens.  The  approximation,  proposed  by  Di  Paolo  et  al.  [77MPH31],  was  based 
on  Pauling's  electronegativities  of  carbons  connected  to  the  polar  hydrogen  and 
heteroatoms  bonded  to  the  carbons.  Using  the  topological  QSPR  approach,  Balaban  et  al. 
[92JCIC233]  developed  four  six-parameter  models  of  similar  quality  (R^  =  0.97)  to 
correlate  the  normal  boiling  points  of  532  haloalkanes  C1-C4  with  their  chemical  structure. 
In  a  simultaneous  study  [92JCIC237],  the  normal  boiling  points  of  185  saturated  acyclic 
compounds  with  one  or  two  divalent  oxygen  or  sulfur  heteroatoms  were  correlated  with 
their  chemical  structure  using  topological  descriptors  and  counts  of  heteroatoms. 

For  diverse  compounds,  numerous  molecular  QSPR  characteristics,  accounting  for 
inter-  and  intramolecular  interactions  in  condensed  media  in  more  detail,  have  been 
developed  on  the  basis  of  physical  models  of  dilferent  complexity.  Kamlet  et  aL  presented 
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a  theoretically  justified  correlation  equation  to  predict  the  boiling  points  of  80  organic  and 
inorganic  liquids  [84JA12()5].  The  equation  dealt  with  dispersion  and  dipolar  interactions 
which  depend  on  the  size,  polarizability,  and  dipolarity  of  the  liquids.  The  size  of  the 
molecule  was  described  by  the  square  root  of  the  molecular  weight,  the  dipolarity  was 
represented  by  the  squared  dipole  moment,  and  the  polarizability  was  reflected  by  the 
number  of  atoms  adjusted  empirically  to  account  for  different  types  of  atoms.  In  this 
study,  molecules  involving  H-bonding  self-association  were  modelled  by  adding  A(ABP) 
=  A  (BP,*,  -  BPcaic)  [84JA12()5].  Grigoras  [9()JCC493]  estimated  boiling  points  of 
organic  compounds  using  the  assumption  that  the  dominant  intermolecular  interaction  was 
related  to  the  molecular  surface  energy  derived  from  the  molecular  surface  area,  and  the 
charge  density  distribution.  The  corresponding  multi-linear  model,  which  gave  a  good 
correlation  (R^  =  0.958)  with  the  boiling  points  of  137  diverse  organic  compounds, 
included  four  parameters:  total  molecular  surface  area,  the  sums  of  the  positively  and 
negatively  charged  atomic  surface  areas  multiplied  by  the  corresponding  partial  charges, 
and  a  hydrogen  bonding  term.  However,  to  achieve  this  correlation,  atomic  charge  scaling 
factors  had  to  be  introduced  to  correct  the  partial  charges  calculated  by  the  extended 
Huckel  theory.  Moreover,  the  external  prediction  of  30  diverse  compounds  had  absolute 
errors  ranging  from  0.4  to  47  K. 

To  encode  the  structural  features  responsible  for  polar  intermolecular  interactions, 
Jurs  and  Stanton  introduced  charged  partial  surface  area  (CPSA)  descriptors,  which 
combine  solvent  accessible  surface  areas  with  partial  atomic  charges  [9()AC2323].  Using 
CPSA  descriptors  in  combination  with  some  constitutional,  topological  and  other 
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descriptors,  Jurs  and  co-workers  found  correlations  with  the  normal  boiling  point  for  five 
large  sets.  Each  set  contained  a  single  class  of  heterocycle:  furans  and  tetrahydrofiirans 
(R^=  0.969)  [91JCICS301],  thiophenes  (R^  =  0.974),  pyrans  (R^  =  0.978),  pyrroles  (R^  = 
0.962)  [92JCICS306],  and  pyridines  (R^  =  0.933)  [93JCICS616],  aU  of  which 
demonstrated  standard  errors  ranging  from  8  to  15  K  for  boiling  point  estimates. 
Although  the  correlation  models  obtained  lacked  uniformity  and  involved  different 
descriptors,  sometimes  even  class  dependent  descriptors  for  different  classes  of 
compounds,  the  CPSA  descriptors  were  demonstrated  to  be  useful,  especially  when 
hydrogen  bonding  specific  descriptors  were  added  to  the  descriptor  pool  [92JCIC3()6, 
93JCIC616].  The  utility  of  the  CPSA  descriptors  was  also  evident  in  a  model  developed 
to  predict  the  boiling  point  for  298  diverse  organic  compounds  (R^  =  0.976,  s  =  1 1.85  K) 
that  employed  8  parameters,  four  of  which  were  CPSA  descriptors  [94JCIC947]. 
Employing  CPSA,  Wessel  and  Jurs  [95JCICS68]  recently  developed  a  valuable  regression 
model  (R^  =  0.994,  s  =  6.3K  n  =  296)  for  hydrocarbons  involving  six  parameters:  two 
toplogical  connectivity  indices,  two  CPSA,  a  partial  charge  descriptor  and  the  square  root 
of  the  molecular  weight.  In  another  study,  a  successful  six-parameter  model  {R^  =  0.948) 
[84JACS1205]  was  developed  for  85  substituted  pyridines  by  Katritzky  et  al.  using  CPSA 
descriptors  along  with  the  gravitation  index  (bulk  quantity),  point-charge  component  of 
the  molecular  dipole,  and  nitrogen  specific  parameters  [94CT17]. 

The  QSPR  models  developed  for  boiling  points  up  to  this  stage  involved  quite  a 
few  descriptors.  Yet,  Principle  Component  Analysis  (PCA)  on  six  liquid  state  physical 
properties  (including  boiling  points)  of  1 14  diverse  compounds  revealed  only  two  major 
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intrinsic  dimensions,  "bulk"  and  "polar  cohesiveness"  [8()JACS1837].  On  the  basis  of  this 
observation,  Cramer  suggested  [8()JACS1837]  that  any  liquid  behaviour  theory  which 
explained  the  macromolecular  properties  in  terms  of  more  than  two  adjustable  parameters 
was  likely  to  contain  redundant  information.  While  the  structural  identification  of  those 
factors  remains  unknown,  the  normal  boiling  points  of  118  external  diverse  compounds 
were  predicted  by  the  2-parameter  model  with  of  0.932.  Murray  et  al.  actually 
managed  to  use  only  two  descriptors  to  develop  a  QSPR  model  for  boiling  points  of  99 
diverse  compounds  [93JPC9369].  One  of  the  descriptors  was  the  molecular  surface  area, 
and  the  other  was  vcrl,, ,  derived  from  the  electrostatic  potential  calculated  on  molecular 
surfaces  defined  by  Bader  et  al.  [87JACS7968].  The  parameter  vcr^„,  measures  the 
tendency  of  a  molecule  for  electrostatic  interactions.  The  correlation  coefficient  (R)  and 
the  standard  deviation  were  reported  as  0.949  {R^  =  0.901)  and  36.5K.  In  a  recent 
unweighted  QSPR  model  by  Le  and  Weers  [95JPC6739],  the  ratio  of  the  isopotential 
molecular  volume  to  isopotential  molecular  surface  area  (V/S)  was  introduced  as  the 
leading  parameter  to  predict  the  normal  boiling  point  of  fluorocarbons.  The 
dimensionality  of  the  molecular  size  in  this  case  had  been  reduced  to  the  first  order 
(length). 

This  chapter  discusses  the  possibility  of  developing  a  non-reduandant  QSPR 
equation  involving  the  main  molecular  structural  characteristics  that  determine  the  normal 
boiling  point  and  the  relevant  inter-  and  intramolecular  interactions  in  liquids.  For  this 
purpose,  the  boiling  points  of  diverse  compounds  will  be  modelled  by  uncovering  and 
developing  the  molecular  descriptors  that  most  closely  resemble  the  intrinsic 
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dimensionality  of  the  respective  interactions.  In  other  words,  the  relationship  between 
boiling  points  and  molecular  structure  Is  to  be  established  using  physically  definite 
molecular  descriptors  in  a  limited  parameter  space. 

Results  and  Discussion 

The  compilation  of  boiling  points  of  298  important  organic  compounds  drawn 
from  the  Design  Institute  for  Physical  Property  Data  (DIPPR)  database  was  chosen  as  the 
complete  set  to  develop  the  QSPR  models.  The  same  data  set  was  previously  used  by  Jurs 
et  al.  [92JCICS306]  in  their  ADAPT  (Automated  Data  Analysis  using  Pattern  Recognition 
Techniques)  QSPR  treatment.  This  set  is  structurally  sufficiently  diverse  and  includes 
saturated  and  unsaturated  hydrocarbons,  halogenated  compounds,  hydroxyl,  cyano, 
amino,  ester,  ether,  carbonyl,  and  carboxyl  functionalities.  Yet,  the  set  is  compact  enough 
to  allow  calculation  of  the  numerous  semi-empirical  quantum-chemical  molecular 
descriptors  employed  in  the  CODESSA  within  a  reasonable  time  frame.  The  structures 
were  drawn  from  scratch  and  pre-optimized  by  the  molecular  mechanics  MMX  method 
using  the  PCMODEL  [92MI]  program.  The  final  geometry  optimization  of  molecules  was 
performed  on  IBM  RISC/6()(K)  model  320  using  the  semi-empirical  quantum-chemical 
AMI  method  [85JACS39()2]  within  MOP  AC  6.0  program  [9()MI].  The  MOP  AC  output 
files  of  individual  compounds  were  loaded  into  the  CODESSA  for  MS  Windows  program 
[95MIk]  along  with  the  boiling  point  data.  The  CODESSA  program  implements 
procedures  which  enable  the  calculation  of  a  large  selection  of  non-empirical  descriptors 


23 

described  in  Appendix  I.  More  than  6(K)  molecular  descriptors  programmed  into 
CODESSA  were  calculated  for  all  compounds.  Various  modifications  of  the  original 
descriptors,  as  will  be  discussed  later,  were  calculated  using  CODESSA  descriptor 
construction  facility.  The  correlation  analysis  to  find  the  best  QSPR  model  of  a  given  size 
was  carried  out  using  two  procedures  described  in  Appendix  II.  Both  strategies  yielded 
the  same  best  correlations  which  added  confidence  to  the  reliability  of  the  methodology 
and  the  QSPR  equations  developed. 

In  order  to  avoid  the  initial  complexities  that  arise  from  specific  (mainly  hydrogen- 
bonding)  interactions  between  the  liquid  molecules,  the  QSPR  treatment  of  the  normal 
boiling  points  was  proceeded  starting  from  a  subset  of  compounds  including  only 
hydrocarbons.  This  would  allow  the  elucidation  of  structural  features  attributable  to 
"bulk"  cohesiveness  described  by  Cramer  [80JACS1837].  The  hydrocarbon  subset 
contained  95  C3-C19  alkanes,  cycloalkanes,  alkenes,  alkylarenes,  and  alkynes.  Several 
bulk-related  descriptors  performed  well  for  the  hydrocarbon  set  (Table  2-1). 

Table  2-1.  One-Parameter  CorrelaUons  for  the  95  Hydrocarbons  


Descriptor 

gravitation  index  for  all  bonded  pairs  of  atoms,  G,  0.9722 

gravitation  index  for  all  pairs  of  atoms,  Gp  0.9629 

AMI  a-polarizability  0.9483 
first  order  Randic  index, 0.9479 

molecular  weight,  MW  0.9225 

zero  order  Kier&Hall  index,  'x"  0.8334 

structural  information  content  0.7503 

Wiener  index  0.6007 
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The  best  performance  was  observed  for  the  gravitational  index  over  all  bonded  atoms  i,j  in 
the  molecule  [96RCRsub],  defined  as 


all  bonded 

G;=   X  ^  (21) 


(where  m,  and  m,  are  the  atomic  masses  of  the  bonded  atoms  and  r,y  denotes  the  respective 
bond  lengths).  Other  important  descriptors  included  (i)  AMI  calculated  a-  polarizability 
of  the  molecule,  (ii)  the  first  order  Randic  index  [75JACS66()9],  (iii)  the  molecular  weight 
(MW),  (iv)  the  zeroth  order  Kier&Hall  index,  (v)  the  structural  information  content 
[88RCR191]  and  (vi)  the  gravitational  index  over  all  pairs  of  atoms  i,j  in  the  molecule 
[96RRCsub],  defined  as 


(2.2) 


(where  nj  denotes  interatomic  distances).  The  Wiener  index  did  not  significantly  correlate 
with  the  boiling  point  of  the  hydrocarbon  set.  More  careful  examination  revealed  a 
systematic  curve-distributed  deviation  from  the  regression  line  for  most  of  the 
correlations,  especially  the  one  involving  the  Wiener  index.  Interestingly  when  the  square 
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root  or  cube  root  of  the  descriptor  was  used,  the  correlations  were  significantly  improved 
(Table  2-2). 


Table  2-2.  One- Parameter  Correlations  for  the  95  Hydrocarbons  (CH)  and  137 
hydrocarbons  and  halogenated  hydrocarbons  (CHX)  


Descriptor 

R^(CH) 

R^(CHX) 

square  root  of  the  gravitation  index,  ^fc^ 

0.9921 

.9647 

cube  root  of  the  gravitation  index,  ^/g7 

0.9875 

.9601 

square  root  first  order  Randic  index, 

0.9805 

.9217 

cube  root  AMI  a-polarizability 

0.9766 

.8852 

square  root  gravitation  index  for  all  pairs  of  atoms,  Jg~ 

0.9765 

.9612 

square  root  of  the  molecular  weight,  -JMW 

0.9537 

.9232 

cube  root  Wiener  index,  Vvv 

0.9(K)2 

.8529 

cube  root  zero  order  Kier&Hall  index 

0.8559 

.8238 

This  observation  suggests  that  the  lower  exponential  order  representation  of  the  molecular 
bulk  descriptors  better  describes  the  related  effective  inter-  and  intramolecular  interaction, 
at  least  for  the  boiling  points.  The  gravitational  index  simultaneously  accounts  for  both 
the  atomic  masses  (volumes)  and  for  their  distribution  within  the  molecular  space.  The 
different  exponential  orders  of  other  topological  connectivity  indexes  (Wiener's,  Randic's 
and  Kier&Hall' s)  or  mass  (volume)  descriptors  gave  no  better  one-parameter  description 
of  the  normal  boiling  point  of  hydrocarbons  than  the  square  root  ^/g7  (/?'  =  0.9921)  or 

cube  root  ^/g7  (R^  =  0.9875,  Figure  2-1)  of  the  gravitation  index.  However,  the 
gravitation  index  did  not  sufficiently  accounted  for  the  differences  among  the  isomers, 
which  led  to  the  main  source  of  error  (averaged  absolute  error  =  5.4  K)  in  the  predicted 
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values.  Such  error  can  be  reduced  to  4.2  K  by  including  the  second  order  Randic  index 
(eq  2.3,  Figure  2-2,  Table  2-2)  to  account  for  the  branching  effects  in  different  isomers  of 
hydrocarbons. 


The  extension  of  the  hydrocarbon  subset  by  addition  of  42  halogenated 
hydrocarbons  leads  to  the  one- parameters  correlation  employing  the  square  root  or  cubic 
root  gravitation  index  with  of  0.9647  and  0.9626,  respectively.  The  best  one- 
parameter  correlations  for  the  boiling  point  of  the  set  of  the  137  hydrocarbons  and 
halogenated  hydrocarbons  are  summarized  in  Table  2-2.  The  second  descriptor  that  can 
be  added  to  improve  the  correlation  for  this  set  of  compounds  was  found  to  be  the  AMI 
calculated  most  negative  atomic  charge  in  the  molecule  (eq  2.4). 


Th  =  (-24.2+4.6)  +  (18.ftt0.3)7G7  +  (-10.5±1.4)^x 


(2.3) 


n  =  95,     =  0.995 1,  F  =  9428,  s  =  5.6,  averaged  absolute  error  =  4.2 


Th  =  (63.5±9.0)  +  (15.1±0.2)7g7  +  (222.7±30.1)5; 


iiutx 


(2.4) 


n  =  137,     =  0.9749,  F  =  2603,  s  =  12.0 


Thus,  the  mass  related  intrinsic  dimensionality  of  intermolecular  interactions,  the  "bulk" 
[80JACS1837],  is  well  represented  by  the  gravitation  index,  while  the  bulk-corrected 
"cohesiveness"  may  vary  with  different  functionalities. 


27 


Experimental  Boiling  point  (K) 


Figure  2-1.  Calculated  vs.  experimental  normal  boiling  points  according  to  the 
best  1 -parameter  model  for  the  95  hydrocarbons. 
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Experimental  Boiling  point  (K) 


Figure  2-2.  Calculated  vs.  experimental  normal  boiling  points  according  to  the 
best  2-parameter  model  for  the  95  hydrocarbons. 
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For  the  complete  data  set  of  298  compounds,  the  predominate  influence  of  the 
gravitation  index  was  diluted  due  to  the  involvement  of  specific  interactions  (mainly 
hydrogen-bonding  self-association)  for  the  majority  of  the  compounds.   The  best  one- 


substantially  poorer  than  those  of  the  corresponding  correlations  for  the  limited  data  sets. 


Accordingly,  a  dramatic  improvement  of  the  correlations  for  the  whole  data  set  was 
observed  if  one  of  the  hydrogen-bonding  related  descriptors  was  added  to  the  QSPR 
equation.  The  best  two-parameter  equation  involved  the  cube  root  of  G,  and  the  area- 
weighted  surface  charge  of  the  hydrogen  bonding  donor  atom(s)  in  the  molecule  (Table  2- 
3).  The  latter  was  calculated  as: 


parameter  correlation  with  the  cube  root  of  G,  yielded  a     of  only  0.7753,  which  was 


(2.5) 


n  =  298,     =  0.7753,  F  =  1035.2,  ,v  =  35.8 


S. 


(2.6) 


llll 
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where  qo  is  the  partial  charge  on  hydrogen  bonding  donor  (H)  atom(s),  So  denotes  the 
surface  area  for  this  atom  and  5,„r  is  the  total  molecular  surface  area,  calculated  from  the 
van  der  Waals'  radii  of  the  atoms  (overlapping  spheres).  The  summation  in  Equation  (2.6) 
was  performed  over  the  number  of  simultaneously  possible  hydrogen  bonding  donor  and 
acceptor  pairs  per  molecule.  Also,  the  hydrogen  atoms  at  the  a-position  to  carbonyl  and 
cyano  groups  were  accounted  as  possible  hydrogen  bonding  donor  centers  (their 
effectiveness  is,  of  course,  much  smaller  because  of  the  smaller  partial  charge  on  them). 


Table  2-3.  The  Best  Two-Parameter  Correlation  of  the  Normal  Boiling  Point  for  the  Data 
Set  of  298  Diverse  Structures  {R^  =  0.9544;  s  =  16.2;  F  =  3127)  

Descriptor  X±AX  t-test 

Intercept  -170.7  ±7.5  -23.2 

^  65.9  ±  0.9  76.9 

HDSA(2)  18470  ±540  34.3 


The  two-parameter  equation  presented  in  Table  2-3  is  physically  highly  significant 
and  demonstrates  that  two  practically  orthogonal  molecular  descriptor  scales  (the 
intercorrelation  coefficient  between  ^/g7  and  HDSA(2)  is  0.2041)  describe  most  of  the 

variance  of  the  normal  boiling  points  for  a  wide  variety  of  organic  substances  (cf.  also 
Figure  2-3).  Both  descriptors  have  explicit  physical  meaning,  the  first  (gravitation  index) 
being  connected  with  the  dispersion  and  cavity-formation  effects  in  liquids,  and  the  second 
with  the  hydrogen  bonding  abihty  of  compounds. 
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Further  adjustment  of  the  correlation  brought  up  a  four  parameter  equation  with 
-  0.9732  (Figure  2-4  and  Table  2-4)  and  involved,  as  additional  descriptors,  the  AMI 
most  negative  atomic  partial  charge  and  the  number  of  chlorine  atoms  in  the  molecule. 
The  most  negative  atomic  partial  charge  accounts  for  the  bulk-corrected  cohesiveness,  as 
discussed  above.  The  chlorine-substituted  compounds  are  distinct  outliers  from  the  two- 
parameter  equation  given  in  Table  2-3  (See  also  Figure  2-3  and  Table  2-5),  with 
systematically  lower  predicted  normal  boiling  point  values.  In  principle,  this  may  be  caused 
by  several  reasons.  In  the  tirst  place,  the  gravitation  index  may  need  to  be  adjusted  to 
correct  for  large  atoms,  such  as  chlorine,  for  which  van  der  Waals  radii  are  comparable 
with  bond  lengths  (e.g.  1.80  for  CI  atom  vs.  1.74  for  C-Cl  bond).  Secondly,  a  minor 
hydrogen  bonding  contribution  may  exist  for  these  compounds  (for  instance,  hydrogen- 
bonded  systems  with  chloroform  are  well-  known).  Nevertheless,  the  account  for  the 
presence  of  chlorine  atoms  is  substantial  in  the  final  QSPR  equation  developed  (Table  2- 
4). 

Table  2-4.  The  Best  Four-Parameter  Correlation  of  the  Normal  Boiling  Point  for  the  Data 
Set  of  298  Diverse  Structures  {R^  =  0.9732;  .v  =  12.4;  F  =  27(H))  


Descriptor  X±AX  r-test 

Intercept  -151.3  ±6.3  -24.1 

67.4  +  0.7  101.1 

HDSA(2)  21540  ±480  45.1 

AMI  most  negative  atomic  charge,  140.4  ±  13.1  10.8 

Number  of  CI  atoms,  Na  17.5  ±  2.3  7.6 
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Experimental  Boiling  point  (K) 


Figure  2-3.  Calculated  vs.  experimental  normal  boiling  points  according  to  the 
best  2-parameter  model  for  the  298  diverse  organic  compounds. 
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Experimental  Boiling  point  (K) 


Figure  2-4.  Calculated  vs.  experimental  normal  boiling  points  according  to  the 
best  4-parameter  model  for  the  298  diverse  organic  compounds. 
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Table  2-5.  The  Comparison  of  the  Experimental  and  Predicted  Normal  Boiling  Points  (K) 
for  the  Set  of  298  Compounds  


No 

Compound 

Tb  (exp) 

Tb(2) 

Tb(4)' 

1 

acrylic  acid 

414.15 

425.43 

422.75 

2 

1 , 1  -dichloropropane 

361.25 

337.48 

373.10 

3 

1 , 1  -diphenylethane 

545.78 

532.55 

539.22 

4 

1,2,3,4- tetrahydronaphthalene 

480.77 

460.33 

472.59 

5 

1 ,2,3-trimethylbenzene 

449.27 

431.39 

438.70 

6 

1 ,2,4-trimethyIbenzene 

442.53 

429.52 

437.45 

7 

1 ,2-dichloropropene 

361.25 

345.82 

380.69 

8 

1 ,2-diphenylethane 

553.65 

532.99 

550.05 

9 

1,2-propylene  glycol 

460.75 

470.04 

478.33 

10 

1,3-butadiene 

268.74 

265.62 

265.93 

11 

1,3-butanediol 

480.15 

473.94 

480.13 

12 

1 ,3-cyclohexadiene 

353.49 

352.17 

361.75 

13 

1 ,3-dichloropropane 

393.55 

338.28 

379.42 

14 

1,3-propylene  glycol 

487.55 

471.32 

481.93 

15 

1,4-butanediol 

501.15 

490.56 

498.53 

16 

1,4-dichlorobutane 

427.05 

366.48 

409.52 

17 

1 ,5-dichloropentane 

453.15 

392.45 

435.61 

18 

1,5-hexadiene 

332.61 

335.79 

334.89 

19 

1,5-pentanediol 

512.15 

503.24 

509.93 

20 

1,6-hexanediol 

516.15 

509.64 

514.00 

21 

1-bromobutane 

374.75 

363.97 

365.08 

22 

1-bromopropane 

344.15 

335.14 

335.73 

23 

1-butene 

266.90 

264.77 

262.52 

24 

1-chlorobutane 

351.58 

320.27 

338.63 

25 

1-chloropentane 

381.54 

350.70 

369.69 

35 
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No 

Compound 

a 

Tb (exp) 

h 

Tb(2) 

c 

Tb(4) 

26 

1-decanol 

503.35 

503.97 

498.87 

27 

1-decene 

443.75 

433.33 

434.27 

28 

1-dodecane 

486.50 

475.78 

477.96 

29 

1-heptene 

366.79 

363.15 

362.00 

30 

1-hexadecene 

558.02 

543.43 

547.64 

31 

1-hexanal 

401.45 

399.23 

395.44 

32 

1-hexanol 

430.15 

429.57 

425.12 

33 

1-hexene 

336.63 

335.23 

334.87 

34 

1-octadecene 

587.97 

572.98 

577.66 

35 

1-octene 

394.44 

390.28 

391.12 

36 

1-pentanol 

410.95 

407.56 

403.28 

37 

1-pentene 

303.11 

302.60 

301.07 

38 

1-tetradecane 

524.25 

511.45 

515.19 

39 

2,2,3,3-tetramethylpentane 

413.44 

411.75 

415.10 

40 

2,2,3-trimethylbutane 

354.03 

361.68 

364.10 

41 

2,2,3-trimethylpentane 

383.(X) 

387.72 

390.41 

42 

2,2,4-trimethylpentane 

372.39 

388.07 

391.24 

43 

2,2-dimethyl- 1  -propanol 

386.25 

408.15 

404.61 

44 

2,2-dimethylbutane 

322.88 

333.24 

334.81 

45 

2,3,3-trimethylpentane 

387.92 

387.86 

390.60 

46 

2,3-butanediol 

453.85 

477.31 

483.76 

47 

2,3-dimethyl- 1 -butene 

328.76 

335.10 

334.02 

48 

2,3-dimethyl-2-butene 

346.35 

336.89 

342.97 

49 

2,3-dimethyl-3-butadiene 

341.93 

336.61 

337.29 

50 

2,3-dimethylbutane 

331.13 

333.59 

335.40 

51 

2,3-dimethylhexane 

388.76 

388.78 

391.52 

36 
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No 

Compound 

a 

Tb  (exp) 

b 

Tb(2) 

c 

Tb(4) 

52 

2,3-dimethylpentane 

362.93 

362.46 

364.61 

53 

2,4,4- trimethyl- 1  -pentene 

374.59 

389.42 

389.50 

54 

2,4,4- trimethyl-2-pentene 

378.06 

389.65 

393.40 

55 

2,6-xylenol 

474.22 

484.37 

490.28 

56 

2-bromobutane 

364.37 

361.72 

363.42 

57 

2-bromopropane 

332.56 

332.70 

333.64 

58 

2-ethylbutyric  acid 

466.95 

All.Ol 

470.70 

59 

2-ethyl-l-butanol 

419.65 

409.78 

402.33 

60 

2-ethyl-l-butene 

337.82 

335.55 

334.33 

61 

2-ethyl-l-hexanol 

457.75 

471.36 

466.98 

62 

2-ethylhexyl  acrylate 

489.15 

509.90 

495.36 

63 

2-hexanol 

413.04 

422.19 

416.35 

64 

2-hexanone 

4(H).85 

399.46 

394.39 

65 

2-methyl- 1  -butanol 

401.85 

382.34 

374.23 

66 

2-methyl- 1  -butene 

304.30 

303.03 

3{X).85 

67 

2-niethyl- 1  -pentanol 

421.15 

432.23 

428.66 

68 

2-methyl- 1  -pentene 

335.25 

335.54 

333.97 

69 

2-methyl-2-butanol 

375.15 

397.31 

391.92 

70 

2-methyl-2-pentene 

340.45 

336.21 

338.22 

71 

2-methyl-3-ethylpentane 

388.80 

388.41 

391.09 

72 

2-methylbutyric  acid 

450.15 

459.75 

453.97 

73 

2-methylhexane 

363.20 

362.83 

364.85 

74 

2-methylpentane 

333.41 

333.66 

334.95 

75 

2-methylpyridine 

402.55 

421.11 

432.38 

76 

2-pentanol 

392.15 

4(K).21 

394.71 

77 

2-pentanone 

375.46 

379.49 

375.38 
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No 

Compound 

a 

Tb (exp) 

Tb(2) 

c 

Tb(4) 

78 

2-propanol 

355.41 

346.26 

341.88 

79 

3,3-diethylpentane 

419.34 

411.98 

415.16 

80 

3,3-diniethyl-  1-butene 

314.40 

333.87 

333.42 

81 

3-chloropropene 

318.11 

286.48 

306.60 

82 

3-hexanone 

396.65 

401.91 

397.99 

83 

3-methyl- 1 -butanol 

404.35 

407.44 

403.44 

84 

3-methyl- 1  -butene 

293.21 

302.01 

301.19 

85 

3-methyl- 1  -pentene 

327.33 

334.55 

334.48 

86 

3-methyl-2-butanol 

384.65 

401.88 

396.78 

87 

3-methyl-2-butene 

311.71 

304.02 

308.83 

88 

3-methylhexane 

365.(X) 

362.82 

364.76 

89 

3-methylpentane 

336.42 

334.11 

335.64 

90 

3-methylpyridine 

417.29 

418.24 

430.53 

91 

3-pentanol 

388.45 

401.78 

396.58 

92 

4-methyl- 1  -pentene 

327.01 

334.79 

333.95 

93 

4-methyI-2-pentanol 

404.85 

424.00 

418.79 

94 

4-methylpyridine 

418.50 

418.34 

429.35 

95 

acetaldehyde 

376.75 

415.9 

407.16 

96 

acetone 

329.44 

322.06 

319.49 

97 

acetophenone 

475.15 

466.93 

464.63 

98 

acetylacetone 

413.55 

411.38 

410.06 

99 

acrylic  aldehyde 

325.84 

343.98 

345.26 

1(X) 

acrylonitrile 

350.50 

343.17 

362.42 

101 

adiponitrile 

568.15 

517.07 

546.67 

102 

a-methylstyrene 

438.65 

430.43 

432.67 

103 

allyl  acetate 

377.15 

389.37 

373.53 
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No 

Compound 

Tb  (exp) 

Tb(2) 

Tb(4) 

104 

allyl  alcohol 

370.23 

355.69 

353.09 

105 

allylamine 

326.45 

325.54 

317.37 

106 

aniline 

457.60 

445.26 

441.43 

107 

benzaldehyde 

451.90 

440.16 

436.92 

108 

benzene 

353.24 

353.94 

367.06 

109 

benzoic  acid 

522.40 

517.81 

512.72 

110 

benzyl  acetate 

486.65 

486.61 

471.24 

111 

benzyl  alcohol 

477.85 

476.17 

473.85 

112 

benzyl  benzoate 

596.65 

576.62 

563.43 

113 

bicyclohexyl 

512.19 

493.30 

506.29 

114 

bromobenzene 

429.24 

432.48 

442.38 

115 

butyl  vinyl  ether 

366.97 

378.94 

372.11 

116 

chlorobenzene 

404.87 

397.38 

429.25 

117 

cis- 1 ,2-dimethylcyclohexane 

402.94 

401.34 

404.80 

118 

cis- 1 ,3-dimethylcyclohexane 

393.24 

401.50 

404.93 

119 

cis- 1 ,4-dimethylcyclohexane 

397.47 

401.31 

404.76 

120 

cis-2-butene 

276.87 

266.07 

269.58 

121 

cis-2-hexene 

342.03 

336.12 

337.55 

122 

cumene 

425.56 

428.10 

431.79 

123 

cyclohexane 

353.87 

349.55 

359.40 

124 

cyclohexanol 

434.(X) 

449.02 

446.48 

125 

cyclohexanone 

428.90 

418.73 

414.87 

126 

cyclohexylamine 

407.65 

423.85 

416.35 

127 

cyclopentadiene 

314.65 

318.54 

323.60 

128 

cyclopentane 

322.40 

317.91 

326.57 

129 

cyclopentene 

317.38 

318.27 

324.41 

39 
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No 

Compound 

a 

Tb  (exp) 

h 

Tb(2) 

c 

Tb(4) 

130 

cylohexene 

356.12 

350.90 

358.93 

131 

di-n-butyl  ether 

413.44 

425.34 

419.04 

132 

di-n-hexyl  ether 

498.85 

502.92 

498.50 

133 

di-n-propyl  ether 

362.79 

377.53 

369.89 

134 

di-n-propylamine 

382.(X) 

408.15 

404.00 

135 

dibutyl  phthalate 

613.15 

636.69 

623.95 

136 

dibutyl  sebacate 

622.15 

648.02 

643.77 

137 

diethyl  ether 

308.58 

319.69 

311.29 

138 

diethyl  ketone 

375.14 

373.59 

369.52 

139 

J •    »i      1      lit     1  i 

diethyl  phthalate 

567.15 

580.26 

567.20 

140 

diethylamine 

328.60 

357.15 

351.52 

141 

diisopropyl  ether 

341.45 

376.48 

368.65 

142 

diisopropylamine 

357.05 

404.15 

398.85 

143 

dimethyl  phthalate 

556.85 

555.73 

542.68 

t  A  A 

144 

dimethyl  terephthalate 

561.15 

548.53 

535.92 

1  /I  c 

145 

diphenyl  ether 

531.46 

526.20 

535.86 

t  A^ 

146 

diphenylamine 

575.15 

558.24 

564.87 

147 

diphenylmethane 

537.42 

516.89 

533.64 

148 

divinyl  ether 

301.45 

322.72 

315.87 

149 

ethyl  acetate 

350.21 

361.83 

343.81 

150 

ethyl  acrylate 

372.65 

387.52 

370.72 

151 

ethyl  benzoate 

486.55 

486.42 

470.49 

152 

ethyl  formate 

327.46 

332.60 

321.92 

153 

ethyl  isobutyrate 

383.(X) 

411.64 

394.42 

154 

ethyl  isopropyl  ketone 

386.55 

400.07 

396.43 

155 

ethyl  n-butyrate 

394.65 

412.14 

395.28 

40 


Table  2-5  Continued. 


No 

Compound 

a 

Tb (exp) 

b 

Tb(2) 

c 

Tb(4) 

156 

ethyl  propionate 

372.25 

388.07 

370.76 

157 

ethyl  propyl  ether 

337.01 

350.35 

342.15 

158 

ethyl  vinyl  ether 

308.70 

321.38 

303.14 

159 

ethylbenzene 

409.35 

405.97 

409.11 

160 

ethylcyclohexane 

404.95 

401.03 

404.10 

161 

ethylcyclopentane 

376.62 

376.05 

378.68 

162 

hexamethylene  imine 

404.85 

425.63 

424.05 

163 

hexanenitrile 

436.75 

397.98 

405.10 

164 

iso-pentane 

3(H).99 

301.34 

302.21 

165 

isobutane 

261.43 

263.22 

263.52 

166 

isobutanol 

380.81 

383.66 

380.60 

167 

isobutene 

266.25 

265.20 

261.54 

168 

isobutyl  acetate 

389.80 

411.77 

394.62 

169 

isobutyl  acrylate 

405.15 

434.49 

417.87 

170 

isobutyl  formate 

371.22 

387.09 

376.95 

171 

isobutyl  isobutyrate 

420.65 

454.30 

437.96 

172 

isobutylamine 

340.88 

360.72 

351.56 

173 

isobutylbenzene 

445.94 

449.20 

453.46 

174 

isobutyraldehyde 

337.25 

344.59 

340.70 

175 

isobutyric  acid 

427.85 

439.26 

434.01 

176 

isobutyronitrile 

376.76 

347.55 

354.89 

177 

isophorone 

488.35 

486.97 

484.14 

178 

isoprene 

307.21 

303.67 

303.83 

179 

isopropyl  acetate 

361.65 

387.95 

370.19 

180 

isopropyl  chloride 

308.85 

283.95 

299.92 

181 

isopropylamine 

305.55 

321.89 

312.98 
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a  h  c 


No 

Compound 

Tb  (exp) 

Tb(2) 

Tb(4) 

182 

isovaleric  acid 

448.25 

458.94 

453.05 

183 

m-cresol 

475.43 

478.56 

486.64 

184 

m-diethylbenzene 

454.29 

450.10 

454.27 

185 

m-diisopropylbenzene 

476.33 

487.63 

493.32 

186 

m-ethyltoluene 

434.48 

429.07 

432.82 

187 

m-toluidine 

476.55 

465.61 

461.15 

188 

m-xylene 

412.27 

406.63 

414.11 

189 

mesityl  oxide 

402.95 

409.89 

406.09 

190 

mesitylene 

437.89 

429.59 

437.69 

191 

methacrolein 

341.15 

360.72 

358.52 

192 

methyl  acrylate 

353.35 

362.23 

344.(K) 

193 

methyl  ethyl  ketone 

352.79 

352.93 

349.48 

194 

methyl  acetate 

330.09 

332.87 

314.20 

195 

methyl  ethyl  ether 

280.50 

284.85 

275.60 

196 

methyl  isobutyl 

389.65 

398.70 

393.68 

197 

methyl  isobutyl  ether 

331.70 

350.10 

342.41 

198 

methyl  isopropenyl  ketone 

371.15 

382.56 

380.67 

199 

methyl  isopropyl  ether 

323.75 

319.48 

310.35 

20() 

methyl  isopropyl  ketone 

367.55 

378.47 

374.98 

201 

methyl  n-butyrate 

375.90 

388.15 

371.24 

202 

methyl  propionate 

352.60 

362.09 

343.95 

203 

methyl  sec-butyl  ether 

332.15 

349.82 

341.93 

204 

methyl  tert-butyl  ether 

328.35 

349.31 

341.72 

205 

methyl  tert-pentyl  ether 

359.45 

376.82 

368.99 

206 

methyl  vinyl  ether 

278.65 

286.71 

277.52 

207 

methylal 

315.00 

337.61 

332.16 
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a  h  c 


No 

Compound 

Tb  (exp) 

Tb(2) 

Tb  (4) 

208 

methylcyclohexane 

374.08 

376.73 

379.83 

209 

methylcyclopentadiene 

345.93 

349.18 

354.42 

210 

methylcyclopentane 

344.96 

348.47 

351.09 

211 

N  ,N-dimethylaniline 

466.69 

461.90 

464.93 

212 

n-butane 

272.65 

263.62 

263.51 

213 

n-butanol 

390.81 

383.98 

380.36 

214 

n-butyl  acetate 

399.15 

412.09 

395.11 

215 

n-butyl  acrylate 

421.00 

434.84 

420.11 

216 

n-butyl  ethyl  ether 

365.35 

377.64 

370.26 

217 

n-butyl  formate 

379.25 

387.70 

378.19 

218 

n-butyl  stearate 

623.15 

660.04 

649.17 

219 

n-butylamine 

350.55 

359.40 

349.42 

220 

n-butylbenzene 

456.46 

449.68 

453.80 

221 

n-butylcyclohexane 

454.13 

446.35 

449.70 

222 

n-butyraldehyde 

347.95 

351.58 

348.33 

223 

n-butyric  acid 

436.42 

420.76 

421.56 

224 

n-butyronitrile 

390.75 

346.21 

352.37 

225 

n-decane 

447.30 

432.19 

433.67 

226 

n-dodecane 

489.47 

475.44 

480.17 

227 

n-heptane 

371.58 

363.09 

365.26 

228 

n-hexadecane 

560.01 

542.97 

549.27 

229 

n-hexane 

341.88 

334.18 

335.50 

230 

n-hexanoic  acid 

478.85 

474.62 

467.94 

231 

n-hexylamine 

404.65 

413.15 

403.45 

232 

n-nonadecane 

603.05 

586.37 

593.70 

233 

n-nonane 

423.97 

412.92 

415.45 
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Table  2-5  Continued.  

II  h  c 


No 

Compound 

Tb  (exp) 

Tb(2) 

Tb(4) 

234 

n-octadecane 

589.86 

572.43 

579.42 

235 

n-octane 

398.83 

389.60 

392.36 

236 

n-pentane 

309.22 

301.68 

302.42 

237 

n-pentyl  formate 

406.60 

411.91 

402.86 

238 

n-pentylamine 

377.65 

387.64 

377.78 

239 

n-propanol 

370.35 

352.70 

349.39 

240 

n-propionaldehyde 

321.15 

319.46 

316.39 

241 

n-propyl  acetate 

374.65 

378.51 

374.60 

242 

n-propyl  chloride 

319.67 

285.62 

303.05 

243 

n-propyl  formate 

353.97 

361.72 

343.97 

244 

n-propyl  propionate 

395.65 

411.89 

395.16 

245 

n-propylamine 

321.65 

330.26 

320.70 

246 

n-propylcyclohexane 

432.39 

428.68 

432.16 

247 

n-propylcyclohexane 

429.90 

424.49 

428.02 

248 

n-propylcyclopentane 

404.11 

4(H).97 

403.91 

249 

n-tetradecane 

526.73 

511.27 

514.68 

250 

neopentane 

282.65 

3(K).48 

302.22 

251 

neopentyl  glycol 

483.(K) 

476.02 

479.41 

252 

o-cresol 

464.15 

478.19 

486.5 

253 

o-dichlorobenzene 

453.57 

435.42 

486.77 

254 

o-diethylbenzene 

456.61 

449.98 

454.15 

255 

o-ethyltoluene 

438.33 

429.01 

432.85 

256 

o-toluidine 

473.55 

455.39 

450.10 

257 

o-xylene 

417.58 

406.46 

413.89 

258 

p-cresol 

475.13 

473.46 

480.87 

259 

p-cymene 

450.28 

449.54 

453.80 
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Table  2-5  Continued.  

a  h  r 


No 

Compound 

Tb  (exp) 

Tb  (2) 

Tb(4) 

260 

p-diethylbenzene 

456.94 

450.10 

454.28 

261 

p-diisopropylbenzene 

483.65 

487.76 

493.49 

262 

p-ethyltoluene 

435.16 

429.06 

432.63 

263 

p-hydroquinone 

558.15 

545.17 

563.59 

264 

p-toluidine 

473.40 

463.68 

459.38 

265 

p -xylene 

411.51 

406.61 

414.35 

266 

phenol 

454.99 

457.50 

465.71 

267 

piperidine 

379.55 

402.81 

403.09 

268 

propane 

231.11 

217.37 

216.19 

269 

propionic  acid 

414.32 

417.96 

413.73 

270 

propionitrile 

370.50 

318.84 

326.42 

271 

propylene 

225.43 

218.75 

215.44 

272 

pyridine 

388.41 

362.07 

368.29 

273 

quinoline 

510.75 

467.81 

476.11 

274 

sec -butyl  acetate 

385.15 

411.68 

394.43 

275 

sec-butyl  alcohol 

372.70 

374.98 

370.33 

276 

sec-butyl  chloride 

341.25 

319.17 

336.30 

277 

sec-butylamine 

336.15 

347.82 

338.40 

278 

sec-butylbenzene 

446.48 

449.26 

453.08 

279 

stearic  acid 

648.35 

640.23 

631.57 

280 

styrene 

418.31 

406.51 

409.82 

281 

tert-butyl  acetate 

369.15 

410.93 

393.63 

282 

tert-butyl  alcohol 

355.57 

369.72 

364.45 

283 

tert-butyl  chloride 

323.75 

317.87 

335.60 

284 

tert-butylamine 

317.55 

350.86 

342.49 

285 

tert-butylbenzene 

442.30 

448.58 

453.45 
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No 

Compound 

a 

Tb  (exp) 

h 

Tb(2) 

c 

Tb(4) 

286 

tetrahydrofuran 

338.(X) 

334.67 

325.96 

287 

toluene 

383.78 

381.46 

388.34 

288 

trans- 1 ,2-dimethylcyclohexane 

396.58 

401.21 

404.66 

289 

trans- 1 ,3-dimethylcyclohexane 

397.61 

401.19 

404.62 

290 

trans- 1 ,4-dimethylcyclohexane 

392.51 

401.20 

404.65 

291 

trans-2-butene 

274.03 

265.94 

269.42 

292 

trans-2-hexene 

341.02 

336.11 

337.55 

293 

trans-crotonic  acid 

458.15 

421.22 

422.45 

294 

trimethylamine 

276.02 

283.59 

276.37 

295 

valeraldehyde 

376.15 

377.87 

373.61 

296 

valeric  acid 

458.65 

456.73 

450.63 

297 

valeronitrile 

414.45 

372.44 

378.65 

298 

vinyl  acetate 

345.65 

363.10 

346.58 

^Experimental  value  from  DIPPR  [94JCICS947]. 
Calculated  according  to  the  2-parameter  correlation  (Table  (2.4)). 
Calculated  according  to  the  4-parameter  correlation  (Table  (2.5)). 


It  should  be  emphasized  that  the  equation  in  Table  2-4  is  characterized  by  almost 
the  same  correlation  coefficient  and  standard  deviation  values  as  the  8-parameter  equation 
developed  by  Jurs  et  al  [92JCICS3()6]  for  a  subset  of  268  compounds  of  the  set  used  in 
this  work  {R  =  0.987  v*.  0.988  and  -v=12.41  v.v.  11.85).  However,  the  number  of  the 
correlation  parameters  has  been  halved,  with  all  four  of  them  having  a  distinct  physical 
meaning.  The  stability  of  the  correlation  equations  presented  in  Tables  2.4  and  2.5  is 
characterized  by  the  corresponding  cross- validated  correlation  coefficients  [95MIk], 
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which  are  ahnost  identical  to  the  correlation  coefficients  themselves  (R^cv  =  0.9534  and 
0.9719,  respectively).  Also,  the  experimental  uncertainties  which  accompany  the  set  of 
data  used  in  this  work  have  been  established  by  the  DIPPR  and  estimated  at  2.1%  error  in 
boiling  point  values  [94JCICS947].  The  average  prediction  error  by  the  best  2-parameter 
correlation  (Table  2-3)  is  3.0%  and  by  the  best  4-parameter  correlation  (Table  2-4)  2.3%. 
Consequently,  the  uncertainty  of  the  four  descriptor  equation  matches  the  experimental 
imprecision  of  data,  and  therefore  no  statistically  better  model  representation  of  the  data 
used  is  to  be  expected. 

Although  a  large  number  of  descriptors  were  screened,  the  possibility  of  having 
derived  chance  correlations  can  be  discounted.  Each  descriptor  added  to  the  models  was 
checked  for  significance  by  the  t-test  and  the  1 -parameter  equation  for  this  descriptor. 
Also,  the  cross-validation  was  carried  out.  In  addition,  the  good  predictions  found  for  the 
test  set  (see  below)  is  the  best  proof  that  the  correlations  do  not  arise  by  chance. 

The  power  of  the  approach  was  tested  by  reference  to  a  set  of  compounds  of 
completely  different  character:  nine  simple  inorganics  containing  one  or  two  atoms  of  first 
row  elements  (Table  2-6).  In  this  severe  test,  the  two-parameter  equation  performed 
credibly  with  an  average  deviation  of  22°  supporting  the  basic  approach  which  allocates 
seminal  influence  to  a  gravitational  index  function  and  a  surface  charge  function.  The  3- 
and  4-parameter  equations  gave  substaintially  less  satisfactory  results  for  the  above- 
discussed  set  of  9  inorganics,  which  indicated  that  in  a  truly  general  treatment,  the  third 
and  fourth  descriptors  need  to  be  modified.    The  gravitation  index  and  surface  area 
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weighted  hydrogen  donor  charges  are  also  the  most  important  descriptors  in  correlation 
with  critical  temperatures  (Table  2-7,  Figure  2-5). 


Table  2-6.  The  Comparison  of  QSPR  Predicted  and  Experimental  Normal  Boiling  Points 
Tt,(K)  for  Some  Inorganic  Substances  


Compound 

(I 

Tb(exp) 

h 

Tb(2) 

Tb(3) 

d 

Tb(4) 

H2O 

373 

371 

400 

372 

H2O2 

425 

493 

545 

543 

NH3 

240 

267 

267 

270 

N2H4 

387 

357 

376 

375 

NH2OH 

330 

339 

347 

350 

HCN 

299 

310 

338 

333 

CH3F 

195 

179 

185 

181 

CH3NH2 

267 

250 

241 

241 

HP 

293 

296 

338 

324 

^  Experimental  value  [94JCICS947]. 
Calculated  according  to  the  best  2-parameter  correlation  (Table  (2.4)). 

c 

Calculated  according  to  the  3-parameter  correlation  (cube  root  of  the  gravitation  index,  the  area- 
weighted  surface  charge  of  the  hydrogen  bonding  donor  atom(s)  in  the  molecule,  and  the  AMI 
most  negative  atomic  partial  charge). 

'^Calculated  according  to  the  4-parameter  correlation  (Table  (2.5)). 


Table  2-7.  The  Best  Three-Parameter  Correlation  of  the  Critical  Temperature  for  the 
Data  Set  of  165  Diverse  Structures  (/?-  =  0.9597;  .y  =  17.0;  F  =  1335)  

Descriptor  X+AX  r-test 

Intercept  85.9  ±  8.0  10.7 

22.9  ±  0.4  58.4 

HDSA(2)  29320  ±  984  29.8 

Topological  electronic  index  all  bonds  -147.3  ±  8.3  -17.7 
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Experimental  Critical  Temperature 

Figure  2-5.  Calculated  vs.  experimental  critical  temperatures  according  to  the 
best  3-parameter  model  for  the  165  diverse  organic  compounds. 
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Most  of  the  structural  descriptors  used  in  the  correlation  equations  were  calculated  from 
the  AMI  optimized  molecular  geometries  and  charge  distribution.  To  study  the 
dependence  of  the  descriptors  and  correlations  obtained  on  the  quantum  chemical  method 
used,  the  calculation  of  the  descriptors  and  regression  treatment  were  repeated  using  PM3 
[89JCC209]  quantum-chemical  optimized  geometries  and  charge  distribution  of  the 
molecules.  The  resulting  QSPR  correlation  equations  were  similar  to  those  obtained  using 
AMI  molecular  descriptors.  The  best  2-parameter  correlation  with  PM3  descriptors  has 
the  following  form: 

Tb  =  (-160.1+8.0)  +  (64.9610.94)  3/g7  +  (20317±673)f/D5A(2)  (2.7) 
n  =  298,     =  0.9445,  F  =  2544.5,  s  =  17.81 

and  the  best  4-parameter  is  represented  as  follows: 

Tb  =  (-161.717.0)  +  (66.6510.82)^  +  (22999164  l)//D5Af  2)  +  (2.8) 
(73.6819.53)5;^  +  (17.8816.39)iVc/ 

n  =  298,     =  0.9598,  F  =  1772,  s  =  15.21 

where  5'^  denotes  the  most  negative  atomic  partial  charge  and  Na  is  the  number  of 
chlorine  atoms  in  the  molecule.  The  statistical  fit  of  these  equations  is  slightly  worse  than 
those  for  the  same  equations  with  AMI  descriptors  (Tables  2.4  and  2.5).  Nevertheless,  the 
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equations  obtained  seem  to  have  little  dependence  on  the  quantum  chemical  method  used 
to  obtain  the  molecular  geometry  and  charge  distribution. 

Both  AMI  and  PM3  charge-dependent  descriptors  were  calculated  proceeding 
from  the  Mulliken  atomic  partial  charges.  For  comparison,  the  AMPAC  [94MI] 
electrostatic  potential  charges  were  applied  in  the  calculation  of  the  charge-dependent 
descriptors.  However,  the  statistical  fit  of  the  corresponding  regression  equations  was 
worse  and  therefore  the  AMI  Mulliken  charges  seem  to  operate  more  adequately  in  the 
QSPR  description  of  the  normal  boiHng  points  of  compounds. 


2.3  Conclusions 


The  QSPR  two-  and  four-parameter  models  allow  the  prediction  of  boiling  points 
of  structurally  diverse  organic  compounds  with  an  average  error  16.2  K  and  12.4  K, 
respectively.  The  one-  and  two-parameter  models  derived  for  various  hydrocarbons  have 
an  average  error  of  5.4  K  and  4.2  K  separately.  The  models  are  theoretically  justified  and 
provide  significant  additional  insight  into  the  relationship  between  the  structures  and  the 
boiling  points  of  the  compounds.  The  overlapping  and  redundant  descriptor  space  in 
representing  inter-  and  intramolecular  interactions  is  eliminated  by  smoothing  out  the 
nonlinear  dependence  of  a  descriptor  to  the  property.  The  cube  root  of  the  gravitation 
index  most  adequately  reflects  the  molecular  size-dependent  bulk  effects  (dispersion  and 
cavity-formation)  whereas  the  second  most  important  parameter,  the  area  weighted 
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surface  charge  or  hydrogen  bonding  donor  atoms  is  directly  related  to  the  specific 
hydrogen-bonding  interactions  in  the  molecule. 

The  successful  description  of  the  normal  boiling  points  with  a  small  number  of 
physically  significant  molecular  descriptors  encourages  the  appUcation  of  the  same 
methodology  for  other  solvation-related  phenomena,  such  as  the  solubilities  of  gases  in 
different  media,  and  solvent  effect  parameters  derived  from  experimental  thermodynamic, 
kinetic  and  spectroscopic  observations. 


CHAPTER  III 

MODELING  THE  SOLUBILITY  OF  DIVERSE  GASES  AND  VAPORS 
IN  WATER  AND  HEXADECANE  USING 
THEORETICAL  PARAMETERS 

Introduction 

The  solubility  of  non-electrolytes  in  water  and  aqueous  solutions  is  of  significant 
chemical  and  thermodynamical  interest  as  well  as  of  great  practical  importance.  Notably, 
the  rationalization  and  modeling  of  the  environmental  fate  of  pollutants,  including  the 
prediction  of  soil/sediment  adsorption  coefficients  and  bioconcentration  factors  for  non- 
ionic  pesticides  from  aqueous  solutions,  requires  a  knowledge  of  their  solubUities  together 
with  other  properties  [81JAFC1059].  Biomedical  appUcations  of  aqueous  solubilities 
include  their  use  in  predictions  of  the  suitabiUty  of  gaseous  anesthetics,  blood  substitutes, 
and  oxygen  carriers  [9 IS  1323,  93IML133]. 

Aqueous  solutions,  including  the  aqueous  solutions  of  gases,  have  many 
pecuUarities  not  generally  observed  for  other  solvents  [75JOC292].  The  tendency  of  a 
molecule  to  pass  from  the  gas  phase  to  a  dilute  aqueous  solution  has  been  proposed  as  a 
measure  of  the  hydrophilic  character  of  the  gas  [75JOC292].  Solute-solvent  interactions 
determine  the  solubiMties  of  gases,  liquids  or  sohds  in  a  given  liquid  or  solution.  However, 
the  solubilities  of  gases  are  more  directly  related  to  the  condensed  phase  solute-solvent 
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interactions  than  the  solubilities  of  liquids  or  solids  as  the  latter  also  involve  the  solute- 
solute  interactions.  Consequently,  gas  solubilities  provide  a  more  appropriate 
experimental  characteristic  for  the  construction  of  theoretical  models  of  solute-solvent 
interactions  in  condensed  media. 

^  _     concentration  of  solute  in  solution 
concentration  of  solute  in  the  gas  phase 

fugacity  of  solute  in  the  gas  phase 

H  =   (3.2) 

activity  of  solute  in  solution 

The  Ostwald  solubility  coefficient  (L)  is  defined  as  the  ratio  of  the  equilibrium 
concentrations  of  a  gaseous  compound  in  the  liquid  and  in  the  gas  phase  (eq  3.1),  where  a 
superscript  w  (L**")  usually  denotes  water  as  a  solvent.  Another  commonly  used  gas 
solubility  parameter  is  the  Henry's  law  constant  H  (eq  3.2),  which  is  approximately  equal 
to  L Analytical  constraints  in  dealing  with  concentrations  at  the  nanogram  level  limit  the 
availabihty  of  reliable  experimental  solubility  data  for  many  chemical  compounds. 
Therefore  it  is  of  great  practical  importance  to  develop  theoretical  approaches  for  the 
accurate  prediction  of  the  solubilities  of  gases  in  liquids. 

Rigorous  statistical  mechanical  treatments  of  gas-liquid  solubilities  are  based  on 
the  calculation  of  chemical  potential  for  the  solute  in  the  gas  and  in  the  solvent  phases 
from  the  respective  partition  functions,  which  should  account  for  all  the  relevant  energetic 
and  probabilistic  factors.  The  classical  approximation  associates  the  solubility  with  the 
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average  interaction  potential  of  a  solute  with  the  entire  solvent  at  some  coupling  strength 
[91S1323,  77CR219].  However,  the  corresponding  equations  are  cumbersome  and 
require  the  parameterization  of  the  solute-solvent  and  solvent-solvent  interaction 
potentials.  Accurate  calculation  of  the  solvation  free  energy,  or  even  the  calculation  of  the 
relative  free  energies  of  solvation  of  structurally  related  compounds,  using  molecular 
dynamics  or  Monte  Carlo  simulation  methods  [84JPC6548,  86JPC795,  89CP193]  is 
arduous  for  several  reasons,  including  the  very  large  size  of  the  configuration  space 
accessible  to  solvent  molecules,  and  the  approximate  nature  of  the  intermolecular  force 
fields  used. 

Therefore,  continuous  efforts  have  been  made  to  derive  analytical  expressions  for 
the  solubility  using  simplified  physical  models.  Such  theoretical  tools,  developed  primarily 
for  the  description  of  hydrophobic  effects,  include  the  classical  electrostatic  models  of 
Onsager  [33CR73]  and  Kirkwood  [35JCP3()0],  scaled  particle  theory  (SPT)  [91S1323, 
77CR219,  63JPC184()],  and  semi-empirical  perturbation  theory  [77CR219,  82JPC4094]. 
A  widely  used  approach  developed  by  Eley  [39TFS1281,  39TFS1421]  is  based  on  the 
division  of  the  solution  process  into  two  consecutive  steps,  of  which  the  first  is  associated 
with  the  cavity  formation  in  the  solvent  and  the  second  accounts  for  the  physical 
interactions  between  a  solute  molecule  immersed  into  the  cavity  and  the  surrounding 
solvent  molecules.  The  first  step  reflects  the  short-range  repulsive  part  of  the  potential 
whereas  the  second  step  involves  the  attractive  long-range  solute-solvent  interaction 
potential  In  the  electrostatic  approach  [33CR73,  35JCP3(X)],  the  excess  chemical 
potential  of  a  solute  molecule  in  a  solvent  is  expressed  as  an  integral  involving  the  solvent- 


solute  pair  distribution  function,  and  solvent-solute  interaction  potential.  This  technique  is 
also  referred  to  as  the  charging  technique  and  has  been  successful  in  studying  solutions  in 
which  the  solutes  are  relatively  small. 

Scaled  particle  theory  (SPT)  [91S1323,  77CR219,  63JPC1840]  characterizes  the 
solvent  using  a  hard-sphere  molecular  diameter  obtained  from  the  heat  of  vaporization  and 
the  thermal  expansivity  of  the  solvent.  The  SPT  technique  has  also  been  extended  to  real 
fluids  and  apphed  to  the  solubilities  of  gases  in  Uquids  [91S1323,  77CR219,  95BJ835]. 
Other  statistical-mechanical  models  of  gas  solubility  include  the  thermodynamic  statistical 
model,  based  on  the  distribution  of  molecular  populations  among  quantized  discrete 
energy  levels  [94JPC626,  95JSC7()3]  and  the  statistical  thermodynamic  lattice-gas  model 
[95PAC881]. 

While  these  theoretical  approaches  to  describe  and  understand  the  processes 
involving  solubility  and  the  underlying  effects  are  well  established  from  the  viewpoint  of 
statistical  mechanics  and  thermodynamics,  it  is  still  difficult  to  quantify  the  solubilities  of 
most  real  systems  from  such  physical  models.  In  fact,  the  application  of  these  theoretical 
models  has  been  restricted  mainly  to  the  calculation  of  the  solubilities  of  the  inert  gases 
and  of  small  gaseous  hydrocarbons.  The  application  to  larger  and  more  complex 
molecules  has  not  been  successful  due  to  the  limitations  in  the  model,  particularly  as 
regards  the  complexity  of  the  estimation  of  the  appropriate  entropic  contributions  to  the 
Gibbs  free  energy  of  solvation. 

An  alternative  approach  to  the  prediction  of  gas  solubilities  in  liquids  is  based  on 
QSPR.  Although  this  approach  has  been  successfully  applied  for  the  correlation  of  many 
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diverse  physical  properties  of  chemical  compounds  [Chapter  I],  relatively  few  attempts 
have  been  made  to  correlate  or  to  predict  the  solubility  of  gases  in  water.  Hine  and 
Mookerjee  [75JCXI1292]  first  reported  empirically  based  group  and  bond  contribution 
schemes.  Their  group  schemecontribution  reproduced  the  solubility  values  of  292  diverse 
compounds  with  a  standard  error  of  0.12  log  units  using  69  group  contribution  factors. 
The  alternative  bond  contribution  scheme  reproduced  the  solubility  values  of  263  diverse 
solutes  with  a  standard  error  of  0.42  log  units  using  34  different  bond  types.  A  group 
contribution  scheme  was  also  developed  by  Cabani  et  ul.  [81JSC563]  who  derived  28 
group  contributions  to  reproduce  209  log  values  of  diverse  compounds  to  within  0.09 
log  units.  Because  of  the  large  number  of  groups  or  bonds  involved  in  these  schemes, 
neither  the  group-contribution  method  nor  the  bond-contribution  method  conveys  much 
understanding  of  the  physical  nature  of  the  relationship  between  the  molecular  structure 
and  interactions  and  the  solubility  of  gases  in  water.  Moreover,  their  application  is 
restricted  to  the  prediction  of  solubilities  of  compounds  containing  only  structural 
functionalities  not  included  in  the  original  set. 

Nirmalakhandan  and  Speece  [88EST1349]  developed  a  QSPR  predictive  model 
involving  three  structurally  determined  descriptors:  the  valence  connectivity  index 
(introduced  by  Kier  and  HaU  [86MIk]),  a  molecular  polarizabihty  descriptor  *,  [82MIh] 
and  an  indicator  variable,  /,  for  the  presence  of  an  electronegative  atom.  The  solubility 
data  for  180  diverse  compounds  were  reproduced  with  a  standard  error  of  0.262  log  units. 
However,  the  polarizability  descriptor  was  calculated  on  the  basis  of  an  atomic 
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contribution  scheme  involving  another  11  empirical  parameters,  which  effectively 
increases  the  number  of  parameters  employed  in  the  correlation. 

Later,  Russell  et  al.  [92AC135()]  correlated  the  logarithms  of  Henry's  law 
constant,  log//,  of  63  diverse  gases  in  water,  using  five  theoretically  calculated 
descriptors.  The  linear  regression  model  obtained  had  a  correlation  coefficient  of  0.978,  a 
standard  deviation  of  0.375  log  units,  and  an  F-value  of  250.  It  was  suggested  by  the 
authors  that  the  factors  influencing  the  solubility  of  gases  in  water  were  related  to  the 
solute  bulk,  the  lipophilicity  and  the  polarizability.  However,  the  data  set  used 
[92AC1350]  was  rather  limited  due  to  the  relatively  small  number  and  structural  similarity 
of  compounds  included. 

Most  recently,  Abraham  et  al.  [94JCSPTII1799]  predicted  the  solubility  of  408 
diverse  gases  in  water  with  five  LSER  (Linear  Solvation  Energy  Relationship)  descriptors, 
comprising  the  excess  molar  fraction  /?2,  calculated  from  the  experimental  molar 
refraction,  the  experimentally  determined  dipolarity/polarizability  TU2    »ind  effective 

H  H 

hydrogen-bond  acidity  Xcci  and  basicity  XP2  »  and  the  McGowan  characteristic  volume 
Vx  (calculated  from  some  tabulated  atomic  increments).  The  regression  model  had  an  R  of 
0.9976  with  a  standard  deviation  of  0.151  log  units  and  an  F-value  of  16810.  Although 
this  correlation  equation  can  be  interpreted  term-by-term  using  well-established  chemical 
principles,  the  LSER  descriptors  are  limited  in  their  ability  to  make  a  priori  predictions 
since  four  experimentally  determined  values  are  required  for  each  compound.  Also,  since 
the  resulting  correlations  do  not  relate  the  property  to  the  molecular  structural 


58 

information,  it  is  difficult  to  elucidate  how  molecular  structure  affects  the  observed 
property. 

One  of  the  objective  of  the  present  work  is  to  develop  QSPR  models  for  the 
aqueous  solubility  of  a  wide  variety  of  gaseous  compounds.  It  is  anticipated  that  in 
addition  to  their  applicabilities  to  predictions,  those  equations  should  provide  useful 
information  about  the  physical  mechanisms  determining  the  solubility  of  gases  in  water. 
Notably,  the  study  of  the  normal  boiling  points  of  a  diverse  set  of  organic  molecules  [cf. 
Chapter  II,  96JPC1()4(K)]  suggested  that  the  molecular  size-dependent  dispersion  and 
cavity-formation  effects  can  be  represented  by  a  limited  set  of  structural  parameters, 
whereas  the  specific  hydrogen-bonding  interactions  in  the  molecule  can  be  effectively 
described  by  the  area  weighted  surface  charge  on  the  hydrogen  bonding  donor  atoms. 
Thus,  the  second  objective  is  to  assess  the  applicability  of  such  theoretical  molecular 
descriptors  for  the  prediction  of  another  condensed  phase  property,  the  solubility  of  gases 
in  liquids. 

Results  and  Discussion 

The  set  of  408  organic  compounds,  previously  used  by  Abraham  et  al. 
[94JCSPTnl799]  in  their  LSER  treatment,  was  chosen  for  the  QSPR  treatment.  Two 
compounds  (sulfur  hexafluoride  and  triethyl  phosphate)  were  eliminated  from  Abraham's 
original  set  since  the  AMI  semi-empirical  method  used  in  this  work  for  the  calculation  of 
theoretical  descriptors  gives  inadequate  geometry  and  charge  distribution  for  these 
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compounds.  The  remaining  set  of  406  compounds  is  still  structurally  very  diverse  and 
includes  saturated  and  unsaturated  hydrocarbons,  halogenated  compounds,  and 
compounds  containing  hydroxyl,  cyano,  amino,  nitro,  thio,  ester,  ether,  carbonyl  and 
carboxyl  functional  groups  and  furan,  pyran,  pyridine  and  pyrazine  rings.  However,  the 
set  is  compact  enough  to  allow  the  calculation  of  the  descriptors  and  the  development  of 
the  QSPR  equations  within  a  reasonable  time  frame.  The  structures  were  drawn  and  pre- 
optimized  with  the  MMX  molecular  mechanics  method  using  the  PCMODEL  [92MI] 
program.  The  final  geometry  optimization  of  compounds  was  performed  on  an  IBM 
RISC/600()  model  320  computer  using  the  semi-empirical  quantum-chemical  AMI 
[85JACS3902]  parameterization  with  the  MOP  AC  6.0  program  modified  by  the  inclusion 
of  the  self-consistent  reaction  field  (SCRF)  method  [89TCN295].  The  MOPAC  results  on 
individual  compounds  were  loaded  into  the  CODESSA  program  [95MIk]  along  with  the 
experimental  solubility  data.  The  CODESSA  program  implements  procedures  which 
enable  the  calculation  of  a  large  selection  of  non-empirical  descriptors  described  in 
Appendix  I. 

More  than  6(X)  molecular  descriptors  were  calculated  using  CODESSA  for  all  406 
compounds.  The  correlation  analysis  to  find  the  best  QSPR  model  of  a  given  size  was 
carried  out  using  the  procedure  based  on  the  stepwise  scale  addition  methods  (See 
Appendix  II,  the  second  strategy)  [96CR1027].  A  preselection  of  descriptors  was 
implemented. 

Similarly  to  the  previous  study  [Chapter  II,  96JPC1()4(K)],  the  correlation  analysis 
was  started  with  a  limited  subset  of  hydrocarbons  to  avoid  initial  complexities  that  arise 
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from  specific  interactions  between  the  solute  and  solvent  (water).  In  that  way,  the 
structural  features  attributable  to  size/mass/bulk  related  cohesiveness,  as  defined  by 
Cramer  [80JACSI837]  should  become  more  apparent.  The  hydrocarbon  subset  contained 
95  alkanes,  cycloalkanes,  alkenes,  alkylarenes,  and  alkynes.    The  best  two-parameter 

2 

equation  obtained  (/?  =  0.9765,  F  =  1988,  s  =  0.20,  averaged  absolute  error  =  0.15,  cf. 
Figure  3-1)  involved  the  gravitational  index  over  all  bonded  atoms  in  the  molecule, 
[95MIk]  and  the  complementary  information  content  ("CIC)  (order  0)  [86MIk].  The 
gravitational  index  (Gj)  is  defmed  by  Equation  (2.1),  in  Chapter  II.  The  "CIC  (order  0)  is 
defined  by  Equations.  (3.3)  and  (3.4): 

0C/C=log2/i-0/C  (3.3) 

IC  =  -nJ^— log,—  (3.4) 
i  n  n 

where  n,-  is  the  number  of  atoms  in  the  i-th  class,  and  n  is  the  total  number  of  atoms  in  the 
molecule.  The  complementary  information  content  (°CIC)  encodes  the  degree  of 
branching  of  a  hydrocarbon  molecule. 

Remarkably,  these  two  descriptors  (Gi  and  °CIC)  individually  have  poor 

2 

correlations  with  gas  solubility  (R  =  0.3997  and  0.0453  respectively).   The  successive 

2  2 

correlation  coefficients  (R  ),  cross-validated  correlation  coefficients  (/?cv )  and  the  t-test 
values  for  these  two  descriptors  are  given  in  Table  3-1.  As  reported  in  the  previous  study 


61 


Experimental  log  Lw 


Figure  3-1.  Calculated  v.v.  experimental  gas  solubilities  of  95  hydrocarbons 
using  the  best  2-parameter  correlation  equation. 
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on  the  normal  boiling  points  [Chapter  II,  96JPC1()4(K)],  the  gravitation  index  (Gi) 
effectively  describes  the  intermolecular  dispersion  forces  in  the  bulk  solvent  (Figure  3-1). 
However,  this  descriptor  varies  little  among  isomers,  and  thus  the  additional  °CIC  index 
accounts  for  the  differences  in  the  shape  of  these  molecules.  Evidently  the  combination  of 
these  two  descriptors,  comprising  size  and  shape  information  about  molecules,  adequately 
represent  the  effective  dispersion  and  cavity  formation  effects  for  the  solvation  of  non- 
polar  solutes  in  water.  According  to  Table  3-1,  the  solubihty  of  hydrocarbons  increases  as 
the  Gi  increases,  and  the  chain  derivatives  are  more  soluble  than  the  branched  derivatives. 

Table  3-1.  The  Best  two-Parameter  Correlation  of  the  Gas  Solubihty  for  the  Data  Set  of 
95  Hydrocarbons 

Descriptor  X±  AX  t  -  test  F 

-1.37  ±0.06  -23.16 
0.0067  ±0.(KK)1  60.80  0.3997  0.3748 

-0.050  ±().(M)1  -47.55  0.9765  0.9746 


Intercept 
Gi 

°CIC 


Further  treatment  then  proceeded  with  the  complete  set  of  406  compounds  to  give 
the  five-parameter  regression  model  with  R"  =  0.9158  (Figure  3-2).  The  most  important 
descriptor  in  this  regression  is  the  hydrogen-bonding  related  descriptor,  HDCA  (2), 
defined  by  Equation  (3.5): 


//Z>C/1(2)  =  X^1^  (3.5) 
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Figure  3-2.  Calculated  vs.  experimental  gas  solubilities  of  406  organic  compounds 
using  the  preliminary  5-pararaeter  correlation  equation  from  Table  3-2. 
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where  Qd  is  the  partial  charge  on  the  hydrogen  bonding  donor  (H)  atom(s).  So  denotes  the 
exposed  surface  area  of  this  atom  and  S„<  is  the  total  molecular  surface  area,  calculated 
from  the  van  der  Waals'  radii  of  the  atoms  (overlapping  spheres).  The  summation  in 
Equation  (3.5)  is  performed  over  the  number  of  simultaneously  possible  hydrogen  bonding 
donor  and  acceptor  pairs  in  the  solute  molecule.  Also,  hydrogen  atoms  attached  to 
carbons  connected  directly  to  carbonyl  or  cyano  groups  were  included  as  possible 
hydrogen  bonding  donor  centers  (their  effectiveness  is,  of  course,  much  smaller  because  of 
the  smaller  partial  charge  on  them).  The  additional  descriptors  in  the  five-parameter 
equation  (c/.  Table  3-2)  are:  the  energy  gap  between  the  highest  occupied  molecular 
orbital  and  the  lowest  unoccupied  molecular  orbital  (£'homo-liimo).  the  numbers  of 
nitrogen  atoms  and  of  oxygen  atoms  in  the  molecule,  and  the  most  negative  partial  charge 
weighted  topological  electronic  index  [86JC63]  (PCWT^)  defined  by  Equation  (3.6): 

PCWT^=-     y  (3.6) 
O 

V£  min  tj 

where  qi  and  qj  are  the  Zefirov  partial  charges  [87DAN883]  of  the  bonded  atoms,  the 
respective  bond  lengths  and  Q,„i„  is  the  most  negafive  partial  charge. 

2  2 

The  successive  values  of  R  ,  R^y  ,  and  the  Fisher  F-value  (F)  for  this  correlation 
are  listed  in  Table  3-2.  The  HOMO-LUMO  energy  gap  relates  to  the  dispersion  energy 
of  polar  solutes  in  solution  [94JPC5817,  73AQC289],  whereas  the  most  negative  partial 
charge  weighted  topological  electronic  index  corresponds  to  the  electrostatic  part  of  the 
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solute-solvent  interaction.  The  HDCA(2)  was  shown  to  adequately  represent  the  specific 
hydrogen-bonding  interactions  in  pure  liquids  [Chapter  II,  96JPC104(X)]. 


Table  3-2.  The  Preliminary  Five-Parameter  Correlation  of  the  Gas  Solubility  for  the  Data 
Set  of  406  Diverse  Structures 

Descriptor  R  R^y  F 


Intercept 


HDCA  (2) 

0.5225 

0.5182 

442.1 

Ehomo  -  Elumo 

0.6940 

0.6901 

457.1 

N(N) 

0.7899 

0.7863 

504.7 

PCWT^ 

0.8805 

0.8771 

739.2 

N(0) 

0.9158 

0.9130 

870.3 

The  inclusion  of  the  number  of  oxygen  and  nitrogen  atoms  in  the  correlation  model 
(cf.  Table  3-2)  may  be  due  to  a  deficiency  of  AMI  calculated  charges  to  describe  properly 
the  electrostatic  and  hydrogen  bonding  interactions  with  the  involvement  of  these  elements 
in  the  solute  molecules.    Notably,  the  use  of  a  linear  combination  of  the  number  of 

2 

nitrogen  and  oxygen  atoms  [2*N(N)  +  N(0)]  yielded  a  four-parameter  correlation  with  R 

2 

=  0.9158  identical  to  that  of  the  five-parameter  correlation  given  in  Table  3-2  (R  = 
0.9158),  Figure  3-3.  Proceeding  from  this  four-parameter  equation,  it  was  found  that  still 
another  descriptor,  the  number  of  rings,  offers  a  statistically  significant  improvement  to 
the  regression  model.  Thus,  the  final  five-parameter  correlation  given  in  Table  3-3  has  R 
=  0.9407,  F  =  1269,  and  s  =  0.53  (averaged  absolute  error  =  0.42)  (Figure  3-4).  The 
successive  regression  coefficients  for  each  descriptor  and  the  respective  standard  errors, 

2  2 

the  R  and  R^y  values,  and  the  t-test  for  this  equation  are  given  in  Table  3-3. 
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Figure  3-3.  Calculated  v.v.  experimental  gas  solubilities  of  406  organic 
compounds  using  the  best  4-parameter  correlation  equation  from  Table  3-3. 
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Figure  3-4.  Calculated  vs.  experimental  gas  solubilities  of  406  organic 
compounds  using  the  final  5-parameter  correlation  equation  from  Table  3-3. 
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Table  3-3.  The  Final  Best  Five-Parameter  Correlation  of  the  Gas  Solubility  for  the  Data  Set 
of  406  Diverse  Structures 


Descriptor 

X±  AX 

t  -  test 

R' 

/?cv 

Intercept 

2.82  ±0.22 

12.92 

HDCA  (2) 

41.61  ±  1.11 

37.44 

0.5225 

0.5182 

N(0)-I-2*N(N) 

0.71  ±0.02 

28.41 

0.6845 

0.6801 

^HOMO  -  LUMO 

-0.17  ±0.02 

-9.42 

0.8792 

0.8763 

PCWT^ 

0.13  ±0.01 

19.03 

0.9158 

0.9134 

N  • 

0.79  ±  0.06 

12.96 

0.9407 

0.9386 

None  of  the  five  descriptors  in  the  final  model  rigorously  account  for  the  mass/size 
and  shape  related  intermolecular  interactions,  and  therefore  the  calculated  solubilities  for 
compounds  with  the  same  functionality,  but  different  size,  do  not  vary  appropriately.  For 
example,  the  solubilities  calculated  for  benzene  and  biphenyl  differ  by  only  0.07  log  units, 
but  their  actual  solubility  difference  is  1.32  log  units.  Also,  the  calculated  difference  in  the 
solubility  of  formaldehyde  and  acetaldehyde  is  0.10  log  units  while  the  observed  difference 
is  0.55  log  units. 

An  attempt  was  made  to  select  the  best  5-parameter  model  with  mass/size  and 
shape  {G\  and  "CIC)  descriptors  being  forced  into  the  regression  equation  for  all  406 
compounds.  However,  these  two  descriptors  were  found  to  have  much  less  significant 
impact  as  compared  to  the  hydrogen-bonding  related  descriptors.  Evidently,  the 
intermolecular  interaction  mechanisms  determining  the  solubility  of  compounds  in  water 
are  different  for  hydrocarbons  as  compared  to  molecules  containing  polar  groups.  In  low 
polarity  solvents,  such  differences  are  less  apparent:  for  instance,  it  was  found  that  the 
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solubility  of  (i)  392  diverse  gases  and  vapors  in  hexadecane  (the  data  are  cited  from  Ref. 
[94JCSPTnl799])  and  (ii)  their  hydrocarbon  subset  are  both  well  correlated  with  the 
gravitation  index  (Gi)  (Table  3-4). 

Although  the  5-parameter  model  is  statistically  less  precise  than  the  equation 
developed  by  Abraham  et  al.  [94JCSpnil799]  utilizing  five  empirical  LSER  descriptors, 
the  equation  is  more  general  because  it  uses  solely  descriptors  which  can  be  solely  derived 
from  the  structure  of  compounds.  Thus,  the  equation  can  be  used  for  the  calculation  of 
solubilities  of  all  compounds  including  those  for  which  some  of  the  experimentally 


Table  3-4.  The  One-Parameter  Correlation  of  the  Gas  Solubility  in  Hexadecane  for  the 

Data  Set  of  392  Diverse  Compounds  and  Their  Hydrocarbon  Subset  

Descriptor  X±AX  t  -  test  R  F  s 

Intercept"  0.015  ±  ().(K)6  027 

Intercept'  0.08  ±0.08  1.01 

G,"  0.0055  ±  0.(KK)1  43.69  0.8329  1908.7  0.53 

Gj"  0.0()65  ±  ().(KX)1  95.30  0.9903  9082.5  0.15 

a  b 
For  392  non-fluorinated  diverse  compounds.    For  the  subset  of  95  hydrocarbons 


measured  parameters  required  by  Abraham's  equation  are  not  available.  As  compared  to 
the  five-parameter  model  by  Jurs  and  coworkers,  [92AC1350]  the  correlations  developed 
here  are  applicable  to  a  much  wider  variety  of  chemical  structures.  The  stabilities  of  the 
correlation  equations  presented  in  Tables  3-1,  3-2  and  3-3  are  characterized  by  the 
corresponding  cross-validated  correlation  coefficients  (rJ^)  [95MIk],  which  are  almost 
identical  to  the  correlation  coefficients  themselves. 
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As  already  mentioned,  all  of  the  structural  descriptors  used  in  the  correlation 
equations  were  calculated  from  the  AMI  [85JACS3902]  optimized  molecular  geometries 
and  charge  distributions  obtained  in  solution  using  the  self-consistent  reaction  field 
(SCRF)  approach.  [89TCM295]  To  investigate  how  the  descriptor  values  and  correlation 
equations  depend  on  the  solvent,  the  descriptors  and  regressions  for  the  isolated  molecules 
were  recalculated  using  the  same  (AMI)  parameterization.  The  five-parameter  correlation 
equation  thus  developed  was  very  similar  to  that  obtained  using  SCRF  molecular 
descriptors  (Equation  3.7).  This  result  encourages  the  use  of  the  quantum-chemical  data 
derived  from  isolated  molecules  for  the  description  of  solvation  processes.  However,  a 
systematic  study  on  the  basis  of  wider  selection  of  solvation  properties  is  needed  for  the 
generalization  of  this  conclusion. 

logl"  =  (2.65±0.22)  +  (42.37±1.1  l)//DCA(2j  +  (0.65±0.02)[N(O)  +  2*N(N)] 
+  (-0.16±().02)(£homo-lumo)  +  m2±i).{)l)PCwf  +  (0.82  ±  0.06)N„,„,  (3.7) 

=  0.9420  F  =  13(X)  .v  =  0.52  (averaged  absolute  error  =  0.420) 

Conclusions 

The  QSPR  models  for  the  solubilities  of  structurally  widely  variable  organic 
molecules  in  water  have  been  successfully  developed  using  correlation  equations  with  a 
limited  number  of  theoretical  molecular  descriptors.  These  descriptors  are  mostly  derived 
from  the  quantum-chemical  charge  distribution  of  the  molecules,  and  have  definite 
physical  meaning  corresponding  to  different  solute-solvent  interactions  in  solutions. 
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Notably,  all  the  descriptors  involved  are  calculated  solely  from  the  chemical  structure  of 
compounds  and,  therefore,  the  prediction  of  the  aqueous  gas  solubility  can  now  be  made 
for  any  compound,  even  for  those  not  yet  synthesized.  Therefore,  a  satisfactory  model 
developed  should  be  useful  in  the  development  of  new  technologically  and  biomedically 
important  materials  and  compounds.  Meanwhile,  the  combination  of  the  gravitation  index 
and  information  content  adequately  represent  the  effective  dispersion  and  cavity  formation 
effects  for  the  solvation  of  non-polar  solutes  in  water.  The  area  weighted  surface  charge 
of  hydrogen  bonding  donor  atoms  is  sufficiently  represented  the  specific  hydrogen- 
bonding  interactions  in  solutions  for  the  polar  molecule. 


CHAPTER  IV 

QSPR  TREATMENT  OF  THE  UNIFIED  NON-SPECIFIC  SOLVENT  POLARITY 
SCALE    TOWARDS  EXPLORING  THE  PHYSICAL  MEANING 
OF  AN  EMPIRICAL  SCALE 

Introduction 

The  solute-solvent  interactions  have  pronounced  influences  on  many  chemical  and 
physical  phenomena  in  solutions.  Early  on,  the  theoretical  evaluation  of  solvent  effects 
was  attempted  by  using  simple  macroscopic  solvent  parameters  such  as  the  static  dielectric 
constant,  permanent  dipole  moment,  and  refractive  index,  or  their  functions  in 
combination  [34JCP351,  39JCP911,  36JACS1486,  73CPL363,  85JPC5759].  Such 
approaches  neglect  the  specific  solute/solvent  interactions,  which  take  place  at  the 
molecular  level,  and  often  result  in  failure  when  correlating  observed  solvent  effects  with 
the  macroscopic  solvent  parameters.  Even  in  the  absence  of  specific  interactions,  solvents 
reorganize  in  real  solutions  to  form  cavities  which  accommodate  the  solute  molecules  with 
stabilization  resulting  from  the  interaction  of  the  solute  dipole  (and  induced  dipole)  with 
the  internal  relative  permittivity  of  the  cavity  [92JCSPTII1827].  Solvent  reorganization 
and  induced  dipole  moments  tend  to  create  an  internal  permittivity  different  from  the  bulk 
relative  permittivity  [92JCSPTII1827].  Not  only  is  the  internal  permittivity  of  the 
organized  solvent  region  difficult  to  quantify  but  so  are  the  dimensions  of  the  cavity 
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surrounding  different  solute  molecules  in  a  solvent.  Assumptions  are  invariably  made  to 
overcome  these  problems  [92JCSPTII1827].  Consequently,  it  has  not  been  possible  to 
define  the  solvent  polarity  in  terms  of  macroscopic  solvent  properties.  Thus,  empirical 
solvent  polarity  scales  have  evolved  as  an  alternative  approach  for  predicting  or  analyzing 
solvent  effects. 

Numerous  polarity  scales  have  been  reported  to  assist  chemists  in  correlating 
various  chemical  phenomena  [77JACS6()27,  83JOC2877,  88QSAR71,  88MIr,  94CR2319] 
in  different  solvents.  A  disadvantage  of  such  scales  is  that  they  reflect,  in  addition  to  the 
nonspecific  effects,  variable  specific  effects,  which  are  dependent  on  the  type  of  probe 
(solute)  used  to  develop  each  scale.  As  a  result,  a  scale  that  works  for  one  system  may 
not  be  suitable  for  another. 

A  unified  solvation  model  has  been  proposed  by  Drago  to  treat  solvation  (eq  1) 
[92JCSPTII1827,  92JOC6547,  94JCSPTII219.  94JCC145,  94JACS75()9,  95JPC6563, 
96JCSPTiisub]: 

Ax  =  P5'+£A*-Efi  +  Q*Cfl  + W  (4.1) 

In  equation  (4.1),  A%  is  the  solvent  dependent  physicochemical  property,  P  is  the 
susceptibility  of  the  solute  probe  to  polarity,  S'  is  the  solvent  polarity  scale  derived  from 
experimental  observations,  Ef,  and  are  the  electrostatic  parameters  of  acid  and  base,  Ca 
and  Cb  are  the  covalent  parameters  of  acid  and  base,  and  W  is  a  constant.  Such  a  model 
separates  solvent  effects  into  nonspecific  (P50  and  specific  {Ea*Eb  +  Ca*Cb)  interactions. 
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The  former  arises  from  the  electrostatic  forces  including  polarization  forces  that  arise  from 
dipole-induced  dipole  moments  between  the  solvent  and  charged  ions  or  dipolar 
molecules.  The  latter  comprises  electron-pair  donor  and  acceptor  interactions  which 
include  hydrogen-bonding  and  n-it'  charge  transfer. 

The  S'  scale  encompasses  a  wide  variety  of  solutes  and  solvents,  but  excludes  the 
specific  solute-solvent  interactions.  Thus  the  non-specific  interactions  are  unified  and  the 
reported  disparities  between  the  scales  are  attributed  to  specific  effects.  Deviations 
resulting  from  predictions  using  S'  allow  an  analysis  of  the  factors  that  underlie  the 
unusual  effects,  instead  of  finding  correlations  by  searching  for  another  scale 
[92JCSPTII1827]. 

In  addition  to  providing  empirical  solvation  parameters,  trends  in  P  values  lead  to  a 
dynamic  cavity  model  for  solvation.  The  same  S'  value  can  be  used  for  a  wide  range  of 
solute  shapes  and  sizes  [92JCSPTII1827].  For  less  polar  solvents,  such  as  cyclohexane, 
ecu,  toluene,  benzene  and  other  aUcanes,  the  S'  values  correlate  well  with  Hildebrand's 
solubility  parameter  [96JCSPTllsub].  For  basic  polar  solvents,  the  S'  values  fit  well  in  a 
correlation  equation  which  involves  both  Hildebrand's  solubility  parameter  and  Hansen's 
dispersion  solubility  parameter  [96JCSPTilsub].  Those  observations  indicated  that  S' 
encompasses  both  solvent  dispersion  and  dipolar  effects.  Accordingly  S'  values  were 
found  vary  with  the  relative  permittivity  e,  dipole  moment  |i  and  Kirkwood  functions  (e- 
l)/(2e  +1)  of  the  solvents  [92JCSptii1827].  Furthermore,  a  QS'^  parameter,  with  Q  being 
a  solute  dependent  factor,  was  found  to  be  a  measure  of  the  energy  needed  to  create  a 
cavity  to  accommodate  the  solutes  [96JCSPTlisub]. 
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The  unified  solvation  model  has  been  very  successful  in  many  applications 
involving  nonspecific  solvation  [92JOC6547,  94JCSPTII219,  94JCC145,  94JACS7509, 
95JPC6563,  96JCSPTnsub].  In  order  to  reveal  the  physical  meaning  of  S',  Drago  and 
Richardson  [96MIpc]  proposed  two  empirical  functions  to  fit  S'  of  29  solvents.  These 
two  functions  are:  (1-1/e)  and  (l-l/n^)/MV,  where  e  and  n  represent  the  dielectric 
constant  and  refractive  index  of  a  solvent,  and  MV  is  the  molar  volume  of  the  solvent. 
The  correlation  was  weighted  and  a  of  0.99  and  a  standard  error  of  0.07  with  F- value  = 
1329  were  achieved  [96MIpc].  Since  more  e's  and  n's  are  available  than  S'  values  (only 
16  S'  values  have  the  accuracy  weighted  at  1  on  a  0  -  1  scale),  the  correlation  provides  a 
useful  equation  for  S'  prediction.  In  addition,  the  two  empirical  functions  indicate  that  the 
dipolarity  and  dispersion  effects  of  a  solvent,  incorporated  in  S',  are  nonlinearly  expanded 
by  refractive  indices  and  dielectric  constants.  A  similar  result  has  been  reported  in  an 
empirical  modeling  of  solvent  polarity  scales  containing  both  dipolarity  and  polarizability 
effects  [96GCI3()7]. 

Up  to  this  point,  it  is  not  clear  how  the  fundamental  structural  features  of  solvents 
influence  the  solvation  capabilities  of  the  solvents.  Furthermore,  for  many  solvents 
experimental  properties  (dielectric  constant,  refractive  index,  spectral  shifts  etc.  )  are  not 
available  to  estimate  S'.  An  approach  which  determines  S'  using  fundamental  molecular 
structural  features  would  be  valuable. 

Chapter  II  and  III  have  demonstrated  that  the  QSPR  technique  can  be  used  to 
establish  the  relationship  between  the  physical  and  chemical  properties  and  the  molecular 
features  which  are  characterizing  the  properties.  By  the  same  token,  QSPR  can  be  used  to 
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build  a  link  between  an  empirical  scale  and  molecular  descriptors.  Since  the  descriptors 
involved  in  a  QSPR  model  are  calculated  solely  from  the  chemical  structures  of 
compounds,  the  influence  of  structural  features  on  solvation  can  be  elucidated,  and  the 
applicability  of  S'  can  be  extended  to  any  compound,  known  or  unknown  without 
synthesizing  it  or  making  any  measurement. 

Famini  and  coworkers  [92QSAR162,  93JCSPTII773,  93ACR599,  95CC209, 
96JCSPTn83]  pioneered  the  development  of  theoretical  solvent  parameters.  Their  TLSER 
(theoretical  linear  solvation  energy  relation)  was  developed  analogous  to  the  LSER  (linear 
solvation  energy  relation)  of  Kamlet  and  Taft  [83JOC2877].  Its  success  in  many 
applications  demonstrated  that  complex  solvent  effects  can  be  explored  using  fundamental 
molecular  properties  determined  by  quantum  chemical  techniques.  Furthermore,  previous 
investigations  [Chapter  II,  96JPC1()4(K)]  showed  that  by  deliberately  screening  the 
redundant  and  overlapping  factors  in  descriptor  space,  quantum-chemical  parameters  that 
resemble  the  intrinsic  dimensions  of  a  propxjrty  can  be  established. 

This  chapter  aims  at  the  development  of  a  model  using  non-empirical  parameters 
that  are  selected,  apriori,  to  have  potential  meaning  in  the  context  of  the  solvation 
process.  This  approach  will  be  referred  to  as  MQSPR,  a  model  based  QSPR  approach. 
Such  a  model  could  be  used  with  more  confidence  for  the  prediction  of  S'  and  may 
provide  insight  into  the  intrinsic  molecular  structural  features  that  influence  S'.  The 
results  from  weighted  correlations,  which  incorporates  the  weights  associated  to  S',  are 
compared  to  the  unweighted  correlation.  The  former  will  be  applied  to  a  set  of  29 
solvents  whose  S'  was  weighted  between  0  -  1.0,  and  has  been  used  by  Drago  and 
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Richardson  to  develop  the  empirical  correlation.  The  latter  will  be  applied  to  another  set 
of  29  solvents,  for  which  the  S'  are  weighted  between  0.8  -  1.0.  The  weighted  correlation 
results  will  be  used  to  explore  the  physical  meaning  of  S'. 

Result  and  Discussion 

The  experimentally  derived  S'  data  for  67  solvents  were  assembled  from  the 
literature  [94MId,  96JCSPTllsub,  96IC239,  96MIpc].  ThLs  solvent  set  included  saturated 
and  unsaturated  hydrocarbons,  halogenated  compounds,  and  compounds  containing 
cyano,  nitro,  amide,  sulfide,  mercapto,  sulfone,  phosphate,  ester,  ether,  carbonyl  groups 
and  fiiran,  pyran,  dioxane,  pyridine,  aniline,  quinoline,  imidazole,  pyrrolidinone,  and 
pyrazine  rings.  Structures  of  the  solvents  were  drawn  and  pre-optimized  by  the  MMX 
molecular  mechanics  method  using  the  PCMODEL  [92MI]  program.  The  final  geometry 
optimization  of  these  compounds  was  performed  on  an  IBM  RISC/6(KK)  model  320 
computer  using  the  semi-empirical  quantum-chemical  AMI  parameterization  with  the 
MOP  AC  6.0  program  which  was  modified  by  incorporation  of  the  self-consistent  reaction 
field  (SCRF)  method  [89TCM295].  The  MOPAC  results  for  individual  compound  were 
loaded  into  the  CODESSA  program  [95MIk]  along  with  the  S'  data. 

The  CODESSA  program  implements  procedures  which  enable  the  calculation  of  a 
large  selection  of  non-empirical  descriptors  described  in  Appendix  I.  In  this  work,  about 
300  molecular  descriptors  were  calculated  using  CODESSA  for  all  67  compounds. 
Descriptors  that  are  solely  associated  with  a  specific  constituent,  such  as  the  number  of  C 
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atoms,  the  minimum  atomic  state  energy  for  a  C  atom,  and  the  average  valence  of  a  C 
atom  etc.  were  not  included  as  they  were  considered  to  be  of  little  relevance.  The 
correlation  analysis  to  find  the  best  QSPR  model  of  a  given  size  was  carried  out  using  the 
second  strategy  described  in  Appendix  II,  based  on  the  stepwise  scale  addition  methods 
[95MIk]. 

The  Unweighted  Correlation 

A  set  of  29  structurally  diverse  solvents,  whose  S'  accuracy  has  been  weighted  as 
0.8  or  1.0  on  a  0.0  -  1.0  scale,  were  selected  to  develop  the  QSPR  model.  The  best  one- 
parameter  equation  involved  the  total  dipole  of  the  molecule.  Other  important  descriptors 
include:  (i)  the  average  information  content  (order  0),  denoted  as  AIC",  [86MIk]  and  (ii) 
the  Onsager-Kirkwood  solvation  energy,  denoted  as  Euusagcr  (Table  4-1).  The  AIC"  is 
defined  by  eq  (4.2),  where  /i,  is  the  number  of  atoms  in  the  i-lh  class,  and  n  is  the  total 
number  of  atoms  in  the  molecule.  It  encodes  the  branching  ratio  and  constitutional 
diversity  of  a  molecule.  The  Onsager-Kirkwood  solvation  energy  Ls  given  by  Equation 
(4.3),  where  e  is  the  macroscopic  dielectric  constant  of  the  solvent  and  |i  is  the  total  dipole 
moment  of  the  solvent. 

A/C"  = -X  ^  log,  ^  (4.2) 
,  n  n 
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'  Onsager 


2e  +  1 


(4.3) 


Table  4- 1 .  The  One-Parameter  Un-weighted  Correlation  for  the  29  Diverse  Solvents 


Descriptor  R  weigh,  o.s  &  i.o 

Total  Dipole  Moment  /  Molar  Liquid  Volume  0.8579 
Total  Dipole  Moment  0.7839 
Onsager-Kirkwood  solvation  energy  0.7264 
Average  Information  Content  (order  0)  0.7213 


The  importance  of  the  total  dipole  moment  of  the  molecule  and  of  the  Onsager- 
Kirkwood  solvation  energy  to  S'  was  already  well  established  in  the  past,  [92JCSPTII1827, 
94MId].  The  significance  of  AIC"  is  in  an  agreement  with  the  previous  study  on  solubility 
of  gaseous  hydrocarbons  in  water  [Chapter  III,  96JCICS1162].  Prompted  by  the 
involvement  of  MV  in  (l-l/n^)/MV,  a  new  descriptor,  the  magnitude  of  total  dipole 
moment  of  a  molecule  (|l)  divided  by  the  molar  volume  (MV),  was  constructed.  The 
resulting  descriptor,  |i/MV,  substantially  improved  the  one-parameter  correlation  (R^  = 
0.8569,  .V  =  0.20,  and  F  =  165,  Figure  4-1).  The  composite  descriptor  is  physically 
justified  since  the  influence  of  molecular  dipole  moments  on  solvation  can  be  affected  by 
molecular  sizes,  i.e.  a  smaller  solvent  molecule  with  a  given  dipole  moment  will  lead  to  a 
lager  total  electrostatic  interaction  of  the  solvent  with  the  solute  than  a  large  molecules 
(fewer  can  be  packed  around  the  solute)  with  the  same  dipole  moment. 
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Experimental  S' 


Figure  4-1.  Calculated  v-v.  experimental  S'  of  29  diverse  solvents  using  the  best 
unweighted  1 -parameter  correlation  equation. 
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Adding  a  second  descriptor,  Fractional  Partial  Negative  Surface  area  (FPNS), 
defined  by  equation  (4.4),  gave  a  two-parameter  model  with  raised  to  0.8862  (Figure 
4-2)  and  the  standard  average  deviation  is  decreased  from  0.20  to  0.18  units.  In  equation 
(4.4),  5.  is  the  partial  surface  area  of  the  negatively  charged  atom(s),  Sm  is  the  total 
molecular  surface  area,  calculated  from  the  van  der  Waals'  radii  of  the  atoms  (overlapping 
spheres).  The  FPNS  descriptor  measures  the  solvent/solute  accessible  surface  areas  of  the 
negatively  charged  atoms  in  a  molecule.  Although  FPNS  helped  to  bring  some  calculated 
S's  closer  to  the  regression  line,  it  did  not  correct  the  significant  over-prediction  for 
nitrobenzene.  When  a  third  descriptor,  the  maximum  coulombic  interaction  for  a  N-0 
bond  (Fmax(N-O))  was  added,  the  molecules  containing  a  nitro  group  fell  back  onto  the 
regression  line  and  the  was  enhanced  to  0.9247  with  ,v  =  0. 15,  F  =  102  (Fig.  4-3).  The 
remaining  most  distinct  outliers  were  cyclohexane,  triethylamine  and  1,4-dioxane. 

FPNS  =  y^  (4.4) 

In  spite  of  good  statistics,  patterns  in  the  deviations  indicated  that  the  total  dipole 
moment  of  the  molecule,  in  combination  with  electrostatic  parameters,  could  not 
comprehensively  represent  S'.  This  property  over-predicted  most  of  the  low  S'  solvents  in 
the  set  and  under-calculated  the  highest  S'  values  of  a  symmetrical  molecule. 
Nevertheless,  the  result  suggested  that  the  dipolarity  of  the  solvents  incorporated  in  S'  can 
be  represented  by  the  molar  volume  weighted  total  dipole  moment  of  a  solvent. 
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Experimental  S' 


Figure  4-2.  Calculated  v.y.  experimental  S'  of  29  diverse  solvents  using  the 
best  unweighted  2-parameter  correlation  equation. 
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Experimental  S' 


Figure  4-3.  Calculated  v.v.  experimental  S'  of  29  diverse  solvents  using  the  best 
unweighted  3-parameter  correlation  equation. 
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More  parameters  (up  to  five)  can  be  employed  to  yield  a  better  correlation. 
However,  the  additional  descriptors  were  not  physically  significant.  Thus,  the  best  3- 
parameter  correlation  is  considered  to  be  physically  more  meaningful  and  statistically 
significant.  The  successive  regression  coefficients  for  each  descriptor  and  the  respective 
standard  errors,  the  and  /?cv^  values,  and  the  t-test  for  this  correlation  are  given  in  Table 
4-2.  The  quality  of  the  model  is  characterized  by  the  corresponding  cross- validated 
correlation  coefficients  (Rev )  [95MIk].  The  prediction  of  S'  for  all  67  solvents  were 
performed  and  the  predicted  values  along  with  their  observed  values  were  provided  at  the 
end  (Table  4-9)  for  comparison.  Knowing  that  the  S'  values  weighted  at  0.8  could  have 
an  averaged  absolute  error  of  0.2,  the  38  predicted  S'  data,  for  which  the  accuracy  of  the 
observations  were  weighted  from  0  -  0.6  on  a  0  -  1.0  scale,  should  be  better  references  for 
nonspecific  solvent  polarity  scale. 

Table  4-2.    The  Best  Un-weighted  Three- Parameter  Correlation  of  the  S'  for  the  29 


Diverse  Solvents 

descriptor 

X±  AX 

t  -  test 

R' 

2 

Rev 

Intercept 

1.38  ±0.06 

23.97 

20.30  ±  1.38 

14.72 

0.8569 

0.8347 

FNSA-2 

-1.08  ±0.17 

-6.21 

0.8860 

0.8431 

F„ax(N-0) 

-0.04  ±0.01 

-4.10 

0.9247 

0.8973 
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The  Weighted  Correlation 

The  theoretical  model  provided  in  Table  4-2  functions  as  a  predictive  tool  for  S'. 
However,  the  intrinsic  dimensions  of  S'  are  not  readily  included  in  such  a  model.  It  is 
possible  that  the  second  and  third  descriptors,  involved  in  the  3-parameter  correlation, 
mainly  fit  to  the  imperfection  of  S'  values  and  the  error  associated  with  the  molar  volume 
weighted  dipole  moments  or  compensate  for  a  more  meaningful  property  to  account  for 
the  polarizability  of  the  solvents.  Since  a  weighted  correlation  weights  significance  to  the 
more  reliable  data  over  questionable  ones,  there  is  a  better  chance  to  resolve  the  real 
dimensions  via  weighted  correlation  for  properties  or  scales  whose  weights  are  known. 

The  S'  values  for  29  solvents  used  in  the  correlation  with  1-1/e  and  (l-l/n^)/MV, 
were  selected  as  the  primary  working  set  for  MQSPR.  Table  4-3  provided  the  name  of 
the  solvents,  the  corresponding  S'  values  and  their  weights.  The  refractive  index  and 
dielectric  constants  of  the  solvents,  readily  collected  from  DIPPR,  were  also  listed  in 
Table  4-3  along  with  the  two  empirical  functions.  The  weights  of  the  S'  values,  ranging 
from  0  -  1.0,  were  loaded  into  CODESSA  along  with  the  two  empirical  functions.  The 
empirical  functions  were  used  as  a  guide  to  assist  the  search  for  the  quantum-chemical 
descriptors  that  represent  the  intrinsic  dimensions  of  S'. 
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Table  4-3.  The  29  Diverse  Solvents  and  Their  S'  Values  and  the  Corresponding  Weights, 
Refractive  Index  (n).  Dielectric  Constants  (e),  and  the  Two  Empirical  Functions.  


Solvent 

S' 

n 

E 

(1-1/n^yMV 

1-1/e 

Wt. 

1 ,2-Dichlorobenzene 

2.10 

1.5510 

9.93 

5.19 

0.8993 

1 

1,4-Dioxane  chair 

1.93 

1.4220 

2.21 

5.93 

0.5475 

1 

2-Butanone 

2.50 

1.3790 

18.50 

5.30 

0.9459 

0.8 

Acetone 

2.58 

1.3590 

20.70 

6.24 

0.9517 

1 

Acetonitrile 

3.00 

1.3440 

37.50 

8.54 

0.9733 

1 

Anisole 

2.04 

1.5180 

4.33 

5.17 

0.7691 

0.6 

Benzene 

1.73 

1.5011 

2.27 

6.26 

0.5595 

1 

Benzonitrile 

2.63 

1.5289 

25.20 

5.61 

0.9603 

1 

Bromobenzene 

2.10 

1.5570 

5.40 

5.69 

0.8148 

0.4 

Carbon  Disulfide 

1.51 

1.6320 

2.64 

10.40 

0.6212 

0 

Carbon  tetrachloride 

1.49 

1.4660 

2.24 

5.54 

0.5532 

1 

Chlorobenzene 

1.98 

1.5240 

5.62 

5.60 

0.8221 

0.8 

Cyclohexane 

1.11 

1.4262 

2.02 

4.71 

0.5050 

1 

Di-n-butylether 

1.58 

1.3992 

3.08 

2.90 

0.6753 

0.8 

Dichloromethane 

2.08 

1.3300 

9.08 

6.83 

0.8899 

0 

Diethyl  ether 

1.73 

1.3520 

4.34 

4.36 

0.7693 

1 

DMSO 

3.(K) 

1.4780 

46.48 

7.64 

0.9785 

1 

Ethyl  acetate 

2.15 

1.3720 

6.02 

4.79 

0.8339 

1 

N  ,N-diniethylacetamide 

2.70 

1.4351 

37.80 

5.53 

0.9735 

1 

N  ,N-dimethylformamide 

2.80 

1.4269 

36.71 

6.57 

0.9728 

1 

n-Decane 

0.90 

1.4121 

1.99 

2.56 

0.4977 

0.2 

n-Hexane 

0.68 

1.3750 

1.89 

3.6 

0.4709 

0.2 

Nitrobenzene 

2.61 

1.5562 

34.82 

5.74 

0.9713 

1 

Nitromethane 

3.07 

1.3935 

35.87 

9.02 

0.9721 

0.8 

Pyridine 

2.44 

1.5100 

12.40 

6.97 

0.9194 

1 

Tetrahydrofuran 

2.08 

1.4070 

7.58 

6.10 

0.8681 

1 

Toluene 

1.66 

1.4970 

2.38 

5.21 

0.5797 

0.9 

Trichloromethane 

1.74 

1.4440 

4.81 

6.50 

0.7919 

0 

Triethylamine 

1.43 

1.4010 

2.42 

3.52 

0.5868 

1 
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The  correlation  analysis  was  first  conducted  for  1-1/e  and  (l-l/n^)/MV.  Three 
descriptors  were  found  to  be  important  for  1-1/e:  the  total  dipole  moment  divided  by  the 
molar  volume  (ji/MV),  the  zero  order  average  information  content  ("AIC,  eq  4.2) 
[86MIk],  and  the  Onsager-Kirkwood  solvation  energy  (Eomager,  eq  4.3).  Other 
descriptors,  such  as  the  a  polarizability,  the  LUMO  energy,  and  the  molar  volume,  had 
little  correlation  with  1-1/e. 

The  correlation  analysis  of  (l-l/n^)/MV  gave  better  results  with  polarizability,  a, 
the  LUMO  energy,  and  the  molar  volume  than  correlations  with  |i/MV,  and  "AIC. 

A  sample  collection  of  the  representative  descriptors  and  their  corresponding  values  for 
both  1-1/e  and  (l-l/n^)/MV  are  given  in  Table  4-4. 


Table  4-4.  The  Weighted  Correlations  of  1-1/e  and  (l-l/n-)/MV  for  29  Diverse  Solvents. 

Descriptor  R  (M/e)  R  (i-i/n2)/Mv 

total  dipole  moment  /  molar  volume  0.7466  0.2492 

average  information  content  (order  0)  0.738 1  0. 1 395 

image  of  the  Onsager-Kirkwood  solvation  energy  0.5625  0.2087 

fractional  partial  negative  charged  surface  area  0.3290  0.0906 

molar  volume  0.2167  0.6541 

LUMO  energy  0.2060  0.3349 

HOMO  -  LUMO  energy  gap  0. 1266  0. 1684 

a  polarizability  0.1206  0.3393 


Table  4-4  suggested  that  both  of  the  empirical  functions  contained  dipolarity  and 
polarizability  components.  1-1/e  mainly  represented  dipolarity,  while  (l-l/n^)/MV 
encoded  more  polarizability  features.  Since  the  molar  volume  weighted  dipole  moment 
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accounts  for  the  dipolarity  of  the  solvents,  another  quantum-chemical  descriptor  that 
represents  the  polarizability  of  the  solvents  needs  to  be  determined.  Such  a  descriptor  in 
combination  with  the  molar  volume  weighted  dipole  moments  should  fit  S'. 

A  number  of  descriptors,  known  to  be  associated  with  polarizability,  were 
attempted  with  \i/MV  in  two-parameter  correlation  analyses.  When  any  one  of  four 
descriptors  were  added  to  p/MV,  an  improved  correlation  resulted.  Table  4-5  lists  the 
descriptors,  the  corresponding  2-parameter  correlation  coefficients,  the  t-test,  the  F- 
values,  the  intercepts,  and  the  coefficients  (ai  and  3.2)  of  the  two  parameters.  The  best 
two-parameter  correlation  involves  the  volume  weighted  total  dipole  of  the  molecule 
(^t/MV)  and  the  HOMO-LUMO  energy  gap  (Ehomo-lumo).  The  HOMO-LUMO  energy 
gap  relates  to  the  dispersion  energy  of  solvents  [73AQC289,  94JPC5817].  A  bigger 
HOMO-LUMO  energy  gap  corresponds  to  a  smaller  polarizability.  The  corresponding 
/^^  F  and  s  values  are  0.9349,  187  and  0.17  respecUvely  (eq  4.5  and  Figure  4-4 ). 

S'  =  (2.56±().27)  +  (22.31±1.34)^lMV  -  (().()9±0.()2)£;homo-lumo  (4.5) 


Table  4-5.  The  Two-Parameter  Correlations  of  S'  for  the  29  Diverse  Solvents 


Descriptors 

F 

a<) 

ai 

■d2 

H/MV 

0.8957 

232 

1.54 

24.02 

Ehomo-lumo  +  |l/MV 

0.9349 

175 

2.55 

22.80 

-0.09 

EuiMo  +  |i/MV 

0.9067 

126 

1.64 

21.78 

-0.09 

"AIC  -I-  |i/MV 

0.9(K)2 

117 

1.37 

16.01 

0.63 

Ehomo  +  M/MV 

0.8970 

106 

2.46 

24.36 

0.10 

a(i  is  the  intercept,  ai  is  the  coefficient  of  )i/MV,  and  ai  is  the  ccx;fficient  of  the  second 
descriptor. 
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Experimental  S' 


Figure  4-4.  Calculated  vs.  experimental  S'  of  29  diverse  solvents  using  the  best 
weighted  2-parameter  correlation  equation  (4.5). 
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Thus,  the  two  principal  intrinsic  dimensions  of  S',  dipolarity  and  polarizability,  are 
represented  by  the  molar  volume  weighted  total  dipole  moment  of  the  molecule  and  the 
HOMO-LUMO  energy  gap.  The  intercorrelation  between  the  two  descriptors  is  0.005, 
therefore,  each  descriptor  describes  a  single  and  orthogonal  molecular  characteristics. 
Since  the  two  parameters  can  be  easily  obtained  through  quantum-chemical  calculations, 
the  working  set  can  now  be  extend  to  include  more  solvents. 

Ideally,  all  67  solvents  could  be  included  in  the  working  set.  However,  caution  has 
to  be  taken  at  this  stage,  since  in  a  previous  report  [96JCICS1162]  the  AMI 
parameterization  yielded  an  inadequate  geometry  and  partial  charge  distribution  for 
nitrogen,  sulfur  and  phosphorus  containing  molecules.  In  order  to  find  out  how  good  the 
calculated  total  dipole  moments  and  molar  volumes  are,  the  experimental  data  from 
DIPPR  and  reference  [74MIm,  89MIm]  are  compared  to  the  calculated  quantities 
providing  the  results  in  Table  4-6. 

The  correlation  between  the  experimental  and  calculated  dipole  moments  for  65 
solvents  (whose  dipole  moment  are  available  in  DIPPR,  74MIm  and  89MIm)  showed  = 
0.9344,  F  =  883,  and  s  =  0.39  debye.  The  averaged  absolute  error  was  0.29  debye. 
Solvents  containing  nitro,  nitrile  and  phosphate  groups  are  recognized  as  the  most  distinct 
outliers.  The  sulfur  containing  solvents  do  not  seem  to  have  any  problem  in  this  case. 
The  biggest  absolute  error  of  2.(X)  debye  was  contributed  by  nitrobenzene. 
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Table  4-6.  The  Experimental  (Exp.)  and  Calculated  (Cal.)  Dipole  Moments  and  Molar 
Volumes  and  the  Corresponding  Absolute  Enor  (Ahs.  .v)  

Dipole  Moment  Molar  Volume 


Solvent 

Exp. 

Cal. 

Abs.  s 

Exp. 

Cal. 

Abs.  s 

1,1,1  -Trichloroethane 

1.78 

1.81 

0.03 

100.3 

90.4 

9.9 

1 , 1 ,2-Trichloroethane 

1.26 

1.26 

0.00 

93.0 

90.2 

2.8 

1 , 1 ,2-Trichloroethene 

0.77 

0.84 

0.07 

90.1 

83.5 

6.6 

1 ,2-Dichlorobenzene 

2.50 

2.21 

0.29 

113.2 

117.8 

4.6 

1,4-Dioxane  chair 

0.42 

0.12 

0.31 

85.7 

89.8 

4.1 

2-Butanone 

2.76 

3.12 

0.36 

90.2 

85.0 

5.2 

3-Pentanone 

2.82 

3.00 

0.18 

106.4 

122.8 

16.4 

4-Butyrolactone 

4.18 

5.04 

0.86 

76.5 

83.4 

6.9 

4-Methylpyridine 

2.75 

2.67 

0.08 

98.1 

104.7 

6.6 

Acetone 

2.88 

3.06 

0.18 

73.9 

67.5 

6.4 

Acetonitrile 

3.92 

3.35 

0.57 

52.7 

50.3 

2.4 

Acetophenone 

3.02 

3.37 

0.35 

117.4 

128.8 

11.4 

Anisole 

1.36 

1.44 

0.08 

109.2 

118.3 

9.1 

Benzene 

0.03 

0.00 

0.03 

89.5 

91.1 

1.6 

Benzonitrile 

3.99 

4.03 

0.04 

103.0 

110.4 

7.4 

Bromobenzene 

1.70 

1.66 

0.04 

105.6 

110.2 

4.6 

Butyl  acetate 

1.84 

2.18 

0.34 

132.6 

129.1 

3.5 

Carbon  disulfide 

0.12 

0.00 

0.12 

60.6 

71.2 

10.6 

Carbon  tetrachloride 

0.(K) 

0.01 

0.01 

97.1 

87.6 

9.5 

Chlorobenzene 

1.69 

1.51 

0.19 

102.3 

104.7 

2.4 

Cyclohexane 

0.(K) 

0.00 

0.00 

108.9 

105.7 

3.2 

Cyclohexanone 

3.08 

3.23 

0.15 

104.1 

108.7 

4.6 

Dichloromethane 

1.60 

1.56 

0.04 

64.4 

58.8 

5.6 

Diethyl  ether 

1.15 

1.47 

0.32 

104.7 

92.0 

12.7 

Di-iso-propyl  ether 

1.13 

1.42 

0.29 

141.8 

126.6 

15.2 

Dimethylaniline 

1.68 

1.67 

0.01 

127.6 

139.0 

11.4 

Di-n-butyl  ether 

1.17 

1.31 

0.14 

170.3 

161.3 

9.0 

DMSO 

3.96 

4.40 

0.44 

71.3 

77.4 

6.1 

Ethyl  acetate 

1.78 

1.80 

0.01 

98.5 

94.2 

4.3 

Ethyl  formate 

1.93 

2.00 

0.07 

80.8 

76.0 

4.8 

Hexamethylphosphoramide 

5.54 

5.23 

0.31 

175.7 

183.8 

8.1 

Hexyl  acetate 

1.86 

2.18 

0.32 

163.9 

163.9 
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Table  4-6  Continued.  

Dipole  Moment  Molar  Volume 


ooiveni 

Exp. 

rVDS.  J 

Exp. 

P'li 

V^ill. 

A  he 

Methyl  acetate 

l.oo 

i.yv) 

O.Z3 

/y.y 

/O.O 

•J  1 
3.1 

N,N-Diethyl  formamide 

^.oo 

4.  It) 

fl  98 

1  1  A  7 

N,N-Dimethylacetamide 

3.0 1 

4.09 

O.ZO 

m  n 
93.0 

y /.4 

A  A 

4.4 

N,N-Dimethylcyanamide 

/111 
4. 1 1 

0.00 

fi  1  n 
oi.U 

N,N-Dimethylformamide 

A  T7 

0.45 

/  /.4 

Q  1  /< 

81.4 

A  n 
4.U 

n-Butyronitrile 

4.U7 

3.41 

0.6o 

o/.y 

o5.0 

Z.3 

n-Decane 

n  /in 

U.W.I 

yj.yij 

1Q^ 

1  87  A 
10  /  .o 

7  7 
/ .  / 

n-Heptane 

U.Wl 

1 47  n 

1  n 

1.)  J.V/ 

1 9  n 

1  z.u 

n-Hexane 

t).(M) 

0.00 

0.00 

131.3 

1 1 8.0 

111 

13.3 

XT*  A  1- 

Nitrobenzene 

4.18 

6.18 

2.00 

lOZ.  / 

1  ly.o 

lo.y 

Nitroethane 

4.0  J 

1  on 
i.zv; 

79  n 
/  z.w 

7A  4 

4  4 

Nitromethane 

/I 

4. JO 

1  in 
1 .  iu 

S4  O 

^0  9 

ny.z 

4 

H.J 

N-Methylimidazole 

1  1 1 

4.0,1 

n  X9 

8Q  4 

N-Methylpyrrolidinone 

A  MO 

/I  1  0 
4.  IZ 

n  n'^ 

yo.  / 

1  n4  8 

8  1 

0. 1 

n-CH  onanc 

1814 

169  9 

1 1  5 

n-r  eniane 

1^.1/  / 

n  ni 

0  0^ 

119  5 

99  3 

20  7 

r  ropioniuiie 

J.-iO 

70  Q 

UO.Vl 

9  Q 

rropyi  acetate 

l.oo 

0  17 

Z.  IZ 

n  94 

1111 

rynuine 

Z.ZD 

n  n/^ 

Q1  7 
y  1 .  / 

Quinoline 

z.lo 

n  1 1 
U.  13 

&n  fi 

fiA  7 
oo.  / 

^  Q 

j.y 

Tetrahydrofuran 

1.72 

2.06 

0.34 

1  1  0  c 

1 18.5 

\1A  1 

1 34. 1 

15.0 

Tetrahydropyran 

l.Dt) 

1.0.1 

n  n/^ 

8 1  Q 
oi.y 

811 
Ol.i 

f  i  8 

u.o 

Tetrahydrothiophene 

1.90 

2.33 

0.42 

88.4 

90.4 

2.0 

Tetramethylene  sulione 

4.69 

4.95 

O.ZO 

95.3 

105.7 

1  /\  A 

10.4 

letrametnylurea 

3.3o 

3.52 

U.  14 

ill  .y 

Thiophene 

0.40 

0.16 

79.5 

85.7 

6.2 

Toluene 

0.36 

0.31 

0.06 

106.6 

109.0 

2.4 

Tributyl  phosphate 

3.21 

2.93 

0.28 

279.2 

Trichloromethane 

1.01 

1.18 

0.17 

80.7 

73.1 

7.6 

Triethylamine 

0.75 

1.06 

0.31 

139.7 

130.8 

8.9 

Triethyl  phosphate 

3.21 

2.94 

0.27 

174.7 

Trimethylbenzene 

0.13 

0.08 

0.05 

139.5 

142.9 

3.4 

Trimethyl  phosphate 

3.18 

2.87 

0.31 

122.3 
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Following  the  statistical  rule,  predicted  values  that  exceed  2.5  time  the  averaged 
absolute  error  are  not  included  in  the  final  correlation  analysis.  Thus,  9  solvents  were 
removed  from  the  working  set.    The  correlation  between  experimental  vs.  calculated 
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dipole  moments  for  the  subsequent  set  of  56  solvents  had  R  =  0.9803,  F  =  2641,  and  s  = 
0.20  debye.  The  averaged  absolute  error  was  0.19  debye.  Notice  that  3-methylsulfolane 
and  propylene  carbonate  were  not  included  in  these  correlations  due  to  lack  of 
experimental  data.  Nevertheless,  these  two  solvents  were  kept  in  the  working  set  since 
they  do  not  contain  the  atoms  that  AMI  parameterization  can  not  handle. 

The  correlation  between  experimental  and  calculated  molar  volume  for  55  solvents 
(for  which  the  experimental  values  are  readily  available  in  DIPPR)  had  =  0.9296,  F  = 
713,  and  s  =  8.24  ml/mole.  The  averaged  absolute  error  was  6.93  ml/ mole,  n-pentane 
was  the  only  solvent,  which  has  an  absolute  error  that  exceeded  2.5  time  of  the  averaged 
absolute  error  (17.3  mllmole).  After  removing  n-pentane  from  the  set,  the  statistic  quality 
was  improved  to  R^  =  0.9372,  F  =  790,  and  s  =  7.84  ml/mole. 

Ten  solvents  are  removed  from  the  working  data  set  as  the  result  of  imprecision  of 
the  calculated  data.  The  remaining  57  solvents  were  submitted  for  mutilinear  regression 
analysis  using  the  calculated  molar  volume  weighted  dipole  moment  and  the  HOMO- 
LUMO  energy  gap.  The  R^  resulted  was  0.9024  with  F  =  245  and  .v  =  0.18.  Removing 
solvents  whose  error  is  2.5  times  the  average  absolute  error  (0.18)  eliminate  1,1,2- 
trichloroethane,  1,4-dioxane,  alkane  chains  and  all  solvents  containing  phosphate  group. 
Alkane  chains  do  not  have  very  accurate  S'  values  because  of  the  aggregation  of  most 
spectra  probes  in  those  solvents.  Solvents  containing  phosphate  group  have  the  calculated 
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dipole  moments  that  are  well  off  the  averaged  absolute  value  (Table  4-6).  Both  1,1,2- 
trichloroethane  and  1,4-dioxane  have  two  conformations  that  are  associated  with  big  total 
dipole  moment  differences,  and  these  solvents  wiU  be  discussed  in  detail  later.  The 
regression  analysis  for  the  remaining  48  solvents  had  R  =  0.9581,  F  =  514,  j  =  0.11  (eq 
4.6)  with  an  averaged  absolute  error  of  0. 10. 

S'  =  (2.45±0.15)  +  (23.48±0.81)^/MV  -  (0.08±0.01)£homolumo  (4.6) 


Equation  (4.6)  compares  favorably  with  the  weighted  correlation  shown  in  Table 
4-5  for  the  set  of  29  solvents  first  studied.  The  same  regression  analysis  was  repeated  for 
a  set  (comparing  set)  of  55  solvents  (for  which  the  experimental  dipole  moments  and 
molar  volumes  are  available  in  DIPPR,  74MIm  and  89MIm)  using  the  experimental  dipole 
moments  and  molar  volumes,  and  the  calculated  HOMO-LUMO  energy  gap.  = 
0.9279,  F  =  335,  and  s  =  0.16  were  resulted  with  the  averaged  absolute  error  =  0.15. 
Noticeably,  1,4-dioxane  and  1,1,2-Trichloroethane  are  the  most  distinct  outliers.  It  is 
difficult  to  beheve  that  AMI  parameterization  could  not  handle  these  molecules.  It  is  also 
unlikely  that  any  of  the  cited  experimental  data  Ls  questionable  since  more  than  one  reports 
[74MIm]  give  the  similar  results.  Removing  the  outliers  which  predicted  S's  exceeding 
2.5  times  the  averaged  absolute  error,  the  correlation  is  refined  to  R^  =  0.9502,  F  =  429,  s 
=  0.13,  with  an  averaged  absolute  error  of  0.12  (eq  4.7).  The  statistical  quaUty  is  similar 
to  the  results  for  fitting  the  calculated  descriptors  to  experimental  measurements  adding 
confidence  in  Equation  (4.6). 


95 


S'  =  (2.39±0.15)  +  (24.74±0.88)n/MV  -  (().()7±().()1)£homo-lumo  (4.7) 

The  S'  value  for  1,4-dioxane  using  equations  (4.5),  (4.6)  and  (4.7)  yielded  the 
absolute  errors  of  0.48,  0.46,  and  0.37  respectively.  This  is  unacceptable  since  the 
observed  S'  for  1,4-dioxane  is  very  well  known  being  weighted  at  1  on  the  0  -  1  scale. 
Earlier  empirical  solvent  polarity  scales  [96GCI3()7]  also  encountered  problems  with  1,4- 
dioxane.  A  higher  value  of  the  total  dipole  of  the  molecule  would  help  to  bring  the  S'  of 
the  1,4-dioxane  closer  to  its  experimentally  derived  value  (1.93).  The  boat  conformation 
of  1,4-dioxane  was  calculated  leading  to  a  dipole  moment  of  1.79,  a  \iJMV  of  0.021,  and  a 
Ehomo-lumo  of  13.0.  These  properties  for  this  molecule  in  its  boat  conformation  leads  to 
an  excellent  prediction  of  S'  (Table  4-7). 

The  boat  conformation  of  1,4-dioxane  may  not  be  the  predominant  species  in 
solution  as  MOP  AC  calculations  show  the  chaii'  conformation  Ls  2.1  kilocalories  more 
stable  than  the  boat.  However,  in  the  vicinity  of  a  solute  molecule  surrounded  by  1,4- 
dioxane  the  added  dipole-dipole  interaction  with  the  solute  can  change  the  conformation 
of  the  1,4-dioxanes  forming  the  cavity  wall.  Alternatively,  the  dioxane  orientation  of  the 
cavity  has  one  of  the  oxygens  pointing  towards  the  solute  and  the  dipole-dipole  solute- 
solvent  interaction  is  better  approximated  by  the  boat  than  the  zero  dipole  moment  of  the 
chair  form.  When  the  solute  is  not  very  polar,  a  diiferent  orientation  of  dioxane  in  the 
cavity  walls  could  lead  to  solvation  that  is  much  less  than  that  predicted  by  S'. 
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According  to  the  unified  solvation  model  [94MId],  the  liquid  media  is  a 
distribution  of  various  size  clusters  in  which  solvent  molecules  are  held  together  in 
orientations  similar  to  the  corresponding  solid  state.  When  a  solute  is  immersed  into 
solvents,  it  replaces  a  solvent  molecule  in  a  cluster  and  orients  so  as  to  maximize  the 
solutes  dispersion  and  polar  interactions  with  the  solvent.  The  replacement  of  a  solvent 
molecule  by  a  solute  molecule  changes  the  polarity  of  the  solvent  cluster  particularly  in  the 
region  surrounding  the  solute.  Even  though  the  AMI  calculation  in  this  work  did 
incorporate  the  reaction  field,  the  net  result  of  such  a  calculation  is  merely  an  up  scaled 
partial  charge  and  dipole  moment  of  all  molecules  [96JCICS1 162,  96MIpc],  which  does 
not  include  changes  of  the  dipole  moment  of  a  solvent  due  to  changes  in  molecular 
conformation  induced  by  solvent-solute  interactions.  Such  changes  can  be  simulated  by 
varying  the  molecular  conformation,  which  is  easily  accomplished  with  computer 
modeling.  This  simplifies  the  exploration  of  peculiar  solvent  effects  that  are  otherwise 
unexplorable  by  experimental  techniques.  A  systematic  simulation  of  the  variation  of 
macroscopic  solvent  properties  due  to  solvent-solute  interactions  may  help  to  improve  the 
solvent  functions  and  parameters  proposed  by  Kirkwood,  Onsager,  Hildebrand  and 
Hansen. 

In  the  same  manner,  the  peculiar  dipole  moment  of  1,1,2-trichloromethane  is  re- 
examined. As  MOP  AC  calculation  shown,  both  trans  and  cis  conformation  have  similar 
final  heat  of  formation.  Following  the  same  argument  made  for  dioxane,  the  bulk 
macroscopic  properties  of  the  trans  derivative  do  not  represent  the  properties  in  solutions 
due  to  maximizing  of  the  solute-solvent  interactions  during  solvation.     Such  a 
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consideration  supports  the  fact  that  the  trans  conformation  fits  well  for  the  experimental 
dipole  moment  but  not  S'  (Table  4-7).  The  cis  conformation  fits  well  for  S'  but  not  the 
experimental  dipole  moment  (Table  4-7). 

Table  4-7.  The  Calculated  Total  Dipole  Moments  for  the  Different  Conformation  of  1,4- 
Dioxane  and  1,1,2-Trichloroethane  Using  the  Weighted  Correlation  Equations  (4.5),  (4.6) 
and  (4.7)  


Solvent 

V- 

\jJMW 

£^H()M(>LlIMO 

c' 

'J  cq  4.5 

c' 

^  cq  4.6 

c' 

^  cq  4.7 

boat  1,4-dioxane 

1.79 

0.021 

13.0 

1.89 

1.93 

n/a' 

chair  1,4-dioxane 

0.12 

().(M)1 

12.9 

1.45 

1.47 

1.56 

cis- 1 , 1 ,2-trichloroethane 

2.57 

().()3() 

11.8 

2.18 

2.23 

n/a' 

tans- 1 , 1 ,2-trichloroethane 

1.26 

0.015 

11.7 

1.85 

1.88 

1.94 

*The  experimental  dipole  moment  values  are  not  available  for  these  particular  conformations. 


From  what  have  been  discussed  above,  the  molecular  descriptors  that  are 
generated  from  the  boat  conformation  of  1,4-dioxane  and  the  d.v  conformation  of  1,1,2- 
trichloroethane  were  considered  appropriate  to  reflect  the  real  molecular  properties  of 
these  molecules  in  solution  towards  polar  solutes.  Thus,  the  chair  dioxane  and  the  cis- 
1,1,2-trichloroethane  were  added  in  the  working  set.  The  subsequent  correlations  for  the 
refined  working  set  has  =  0.9587,  F  =  534,  and  .v  =  0. 1 1  (Figure  4-7,  eq  4.8).  Table  4- 
8  provided  the  successive  regression  coefficients  for  each  descriptor  and  the  respective 
standard  errors,  the  and  F  values,  and  the  t-test  for  equations  (4.5),  (4.7),  and  (4.8). 
Evidently,  the  coefficients  of  these  two  descriptors  in  all  cases  are  very  similar  and  the 
volume  weighted  total  dipole  of  the  molecule  and  the  HOMO-LUMO  energy  gap  account 
for  most  of  the  variances  of  the  S'  data  in  all  equations,  which  adds  the  confidence  in  the 
MQSPR  models'  reliability. 
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Experimental  S' 


Figure  4-5.  Calculated  v.v.  experimental  S'  of  50  diverse  solvents  using  the  best 
weighted  2-parameter  correlation  equation  (4.8). 
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Table  4-8.  The  Two-Parameter  Correlations  for  S'  of  the  Empirical  Set  (29  Solvents), 
the  Finalized  Working  Set  (50  solvents)  and  the  Comparing  Set  (48  Solvents).  


Data  Set 

Descriptor 

X±  AX 

t  -  test 

F 

empirical 

Intercept 

2.61  ±0.22 

11.82 

cal.  n/MV 

23.10±  1.12 

20.58 

0.9087 

269 

Ehomo-lumo 

-0.09  ±  0.02 

-5.30 

0.9562 

284 

working 

Intercept 

2.45  ±0.15 

16.38 

cal.  m/MV 

23.42  ±  0.80 

29.28 

0.9242 

573 

Ehomo-lumo 

-0.08  ±0.01 

-6.20 

0.9587 

534 

comparing 

Intercept 

2.39  ±0.15 

15.86 

exp.  |i/MV 

24.77  ±0.88 

28.20 

0.9246 

564 

Ehomo-lumo 

-0.07  ±  0.01 

-5.60 

0.9502 

429 

S'  =  (2.45±0.15)  +  (23.42±0.8())MyMV  -  (().()8±().()1)£homo-lumo  (4.8) 


Equation  (4.8),  developed  from  50  diverse  solvents,  was  identified  as  the  finalized 
correlation  equation.  According  to  this  equation,  S'  primarily  represents  the  size 
independent  dipole  strength  and  the  dispersion  abiUty  of  the  solvents.  This  is  the  case  for 
both  the  basic  polar  solvents  and  the  less  polar  solvents.  It  is  worth  emphasizing  that  the 
weights  of  the  S'  values  was  incorporated  at  the  beginning  to  avoid  garbage  in  and 
garbage  out  correlation,  not  to  dramatize  the  correlation  coefficient.  In  fact,  the 
unweighted  correlation  for  equation  (4.8)  has  R'  =  0.9058,  F  =  221,  s  =  0.14.  The 
calculated  S'  values  using  weighted  correlation  equation  (4.8)  compares  favorably  with  the 
experimentally  derived  values  listed  in  Table  4-9. 
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Table  4-9.  The  Experimental  vs.  Calculated  S'  Using  Equation  4.8  and  Table  4-2  for  67 
Solvents  


no  Solvent 

rormuia 

'J  exp. 

'J  eq4.g 

c 

iJ  toble  4-2 

Wt* 

1   1,1,1- incnioroeinane 

l-l3\_-\^ll3 

1 

9  03 

1  1  1 

1 

1 

z  1 , 1  ,Z- 1  ncnioroetnane 

uiri2'-iiL-i2 

2.35 

9  9"^ 

0  9 

J    1,1  ,z- 1  riLniorocincric 

1  on* 

1  92 

1.62 

0.4 

H  i,z-i-'icniorouenzcnc 

1  9-Pi,r^H. 

2  10 

2  16 

2.09 

1 

J   1  ,'i-iJioxane 

Vy^.^_^2V-^2  ^2*-' 

1  93 

1  59 

0 

o  z-DUianone 

i^n3V.-\^V-/  ^\...2n5 

Z.3  1 

2  46 

2  38 

0.8 

/  j-jvieinyisuuoiane 

2  55 

2  57 

3.50 

0.4 

0  j-rentanone 

L.2rl5*-v-'^2"5 

9  98 

9  20 

0  8 

9  4-Butyrolactone 

2.86 

9  06 

3  04 

0  8 

10  4-Methylpyridine 

/I  /^U  P  14  M 

Z.J  1 

9  98 

9  30 
z*.  jw 

0  8 

11  Acetone 

(Ctl3)2<-U 

0  ^8 
Z.Do 

9  fiO 

1 

i 

12  Ace  torn  tnle 

PH  p\r 
L-ri3t^rN 

^  no 

3  00 

3  02 

1 

1 

13  Acetophenone 

L.6rl5L-VUjL.rl3 

9  SO 

9  3^ 

2  48 

0  8 

14  Anisole 

l_6ri6^J*^ri3 

9  00 

2  02 

0.6 

15  Benzene 

*^6ll6 

1  73 

1  65 

1.72 

1 

lo  isenzoniiTiie 

2  63 

2.59 

2.42 

1 

1  /  Dromuocii^ciic 

2.10 

2.06 

1.96 

0.4 

1  O                      •!  ny->A4-n«-A 

lo  Butyl  acetate 

1  00 

1  01 

2  05 

0.4 

ly  L^Don  aisuiiiae 

1  51 

1  78 

1.49 

0 

20  Carbon  tetrachloride 

/^P1 

1  AQ 

1  .J  0 

1  42 

1 

1 

zi  L.moroDenzene 

P,H,P1 

1  08 

2  04 

2.12 

0.8 

zz  uycionexane 

P,H., 

1  1 1 

1  33 

1.40 

1 

23  Cyclohexanone 

r'PH-lrPO 

i,L,rl2;5v-v-' 

0  1^* 

2  30 

2.27 

1 

Z4  ui-n-DUiyi  emer 

<'n-P^l-In'>^0 
(,Il-\^4n9^2v-' 

1  58 

1.70 

1.64 

0.8 

Zj  ui-n-propyi  emer 

i  Pr^n 
l-rT2'-' 

1  76 

1.62 

1.68 

0.6 

zo  uicniorometnane 

i_i_i2n2 

9  08 

2  17 

2.16 

0 

27  Diethyl  ether 

(C2H5)20 

1  73 

1  79 

1.82 

1 

28  Dimethylaniline 

C6H5N(CH3)2 

1  OA* 

1.96 

9  04 

9  00 

0  6 

29  DMSO 

(CH3)2SO 

'I  on 

09 

1 

1 

30  Ethyl  acetate 

CHjCCOOCzHs 

0  1  <; 
z.  1 J 

1  OS 

9  08 

1 
1 

3 1  Ethyl  formamide 

HC(0)NEt2 

2.59 

2.45 

2.39 

0.6 

32  Ethyl  formate 

HCOOC2H5 

2.24 

2.12 

2.33 

0.4 

33  Hexamethylphosphoramide  [(CH3)2N]3PO 

2.52 

2.27 

2.63 

0 

34  Hexyl  acetate 

CH3C(0)0hex 

1.94 

1.82 

1.96 

0.4 

35  Methyl  acetate 

CH3C(0)OCH3 

2.35" 

2.08 

2.24 

0.6 

36  N,N-dimethylacetamide 

CH3CON(CH3)2 

2.70" 

2.62 

2.55 

1 
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Table  4-9  Continued. 


no  Solvent 

Formula 

S  cxp. 

S'cq  4.« 

S'iable4-2 

Wt . 

37  N,N-Dimethylcyanamide 

(CH3)2NCN 

2.81" 

2.79 

2.74 

0.2 

38  N,N-Dimethylformamide 

HCON(CH3)2 

2.80 

2.86 

2.72 

1 

39  n-Decane 

Ci()H22 

0.90 

1.32 

1.45 

0.2 

40  n-Heptane 

C7H16 

0.79 

1.33 

1.46 

0.2 

41  n-Hexane 

0.68 

1.31 

1.43 

0.2 

42  N-Methylimidazole 

2.60 

1.31 

2.80 

0.8 

43  N-Methylpyrrolidinone 

CH2CH2CH2CONCH.1 

2.62 

2.90 

2.49 

0.6 

44  n-Nonane 

C9H20 

0.90 

2.55 

1.46 

0.2 

45  n-Butyronitrile 

CH3CH2CH2CN 

2.70 

2.40 

2.50 

0.4 

46  n-Pentane 

C5H12 

0.57 

1.30 

1.45 

0.2 

47  Nitrobenzene 

Cf,HsN02 

2.61 

2.97 

2.72 

1 

48  Nitroethane 

C2H5NO2 

2.78" 

3.09 

2.65 

0.2 

49  Nitromethane 

CH3NO2 

3.07 

3.41 

2.96 

0.8 

50  Pentahydropyran 

(CH2)50 

1.98 

2.61 

1.85 

0.6 

51  Propionitrile 

C2H5CN 

2.80 

1.96 

2.67 

0.6 

52  Propyl  acetate 

CH.,C(0)OPr 

2.05 

3.21 

1.98 

0.4 

53  Propylene  carbonate 

(CH2)3(0-)2CO 

3.10 

2.29 

3.38 

0.6 

54  Pyridine 

CsHsN 

2.44 

2.16 

2.32 

1 

55  Quinoline 

2.30 

2.04 

2.12 

0.2 

56  Tetrahydrofuran 

(CH2)40 

2.08 

1.81 

2.04 

1 

57  Tetrahydrothiophene 

(CH2)4S 

1.99" 

2.33 

1.97 

0.2 

58  Tetramethylene  sulfone 

(CH2)4S02 

2.88" 

2.72 

3.66 

0.4 

59  Tetramethylurea 

[(CH3)2N]2CO 

2.48" 

2.26 

2.31 

0.6 

60  Thiophene 

(CH)4S 

1.83" 

1.82 

1.84 

0.2 

61  Toluene 

1.66 

1.75 

1.77 

0.8 

62  Tributyl  phosphate 

(n-C4H,0)3PO 

2.30 

1.78 

2.46 

0.4 

63  Trichloromethane 

CCI3H 

1.74' 

1.95 

1.88 

0 

64  Triethylamine 

(CiH.O^N 

1.43 

1.71 

1.66 

1 

65  Triethyl  phosphate 

(C2H.,0)3PO 

2.55" 

1.92 

2.58 

0.6 

66  Trimethylbenzene 

1.54^' 

1.70 

1.66 

0.4 

67  Trimethyl  phosphate 

(CH.,0)3P0 

2.79" 

2.07 

3.18 

0.2 

S'  are  collected  from  four  sources:  "[94MId]  Drago,  R.  vS.  Applications  of  Electrostatic-Covalent 
Models  in  Chemistry  1994,  Surfside  Scientific  Publishers:  Gainesville,  FL.  [96JCSPTIlsub] 
Bustamante,  P.  Drago,  R.  S.  J.Chem.  Soc.  Perkin  Trans.  2  1996  (submitted)  '  [96IC239]  George, 
J.  E.  Drago,  R.  S.  Inorg.  Chem.  1996,  35,  239-241.  *[96Mlpc].  'The  weights  are  cited  from 
reference  [96JCSPTIIsub]  and  [96MIpc]. 
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Conclusion 

The  MQSPR  model  of  unified  nonspecific  solvent  polarity  scale  was  developed 
using  weighted  and  unweighted  correlations.  The  molar  volume  weighted  total  dipole 
moment  of  the  molecule,  and  the  HOMO-LUMO  energy  gap  are  involved  in  the  weighted 
two-parameter  correlation.  These  two  descriptors  closely  resemble  the  intrinsic 
dimensions  of  S',  which  are  recognized  as  dipolarity  and  polarizability.  The  unweighted 
three-parameter  correlation  involves  the  molar  volume  weighted  total  dipole  moment  of 
the  molecule,  the  fractional  partial  negative  surface  area  and  the  maximum  coulombic 
interaction  for  a  N-0  bond.  Both  correlation  equations  allow  the  prediction  of  S'  of 
structurally  diverse  solvents.  All  parameters  in  the  correlation  equations  are  solely  derived 
from  the  quantum-chemical  calculated  descriptors  of  the  solvents.  The  prediction  of  S'  no 
longer  requires  several  measurements  of  physical  and  chemical  quantities.  The  correlation 
and  analysis  of  nonspecific  solvent  effects  can  be  more  conveniently  applied  in  the 
development  of  new  technologically  and  biomedically  impoitant  solvents. 


CHAPTER  V 
CONCLUSION 


A  fundamental  objective  of  scientific  research  is  to  explore  unifying  relationships 
among  the  experimentally  observed  data.  Relationships  such  as  the  ideal  gas  law. 
Maxwell's  theory  of  electromagnetic  field  and  Schrodinger  equation  are  classical  ideal 
examples  and  have  the  most  conspicuous  significance.  Relationships  between  the 
physical  chemical  properties  of  a  compound  and  its  structure  is  of  increasing  importance 
in  today's  chemistry,  medicinal  science  and  biological  technology.  The  primary 
importance  of  QSPR  has  been  recognized  in  predicting  unknown  properties  and  in 
guiding  drug  design  and  the  choice  of  synthetic  route.  The  discussed  accomplishment  in 
this  dissertation  marginally  improves  the  use  of  QSPR  to  provide  insights  into  the 
physical  nature  of  the  inter-  and  intramolecular  interactions  related  to  the  experimentally 
observed  properties. 

It  is  evident  that  the  boiling  point  models  presented  in  this  dissertation  generalize 
a  very  important  relationship  between  the  structure  of  a  compound  and  its  boiling 
temperature.  The  boiling  point  is  primarily  determined  by  the  gravitation  index,  a 
mass/bulk  cohesiveness  descriptor,  and  by  the  surface  area  weighted  partial  charges  on 
the  hydrogen  donor  atoms.  By  incorporating  two  more  descriptors,  a  match  between  the 
predicted  errors  and  the  experimental  uncertainty  is  achieved.   These  models  provide 
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a  clear  insight  into  just  how  the  structure  affects  the  boiling  point  and  the  relevant  inter- 
and  intramolecular  interactions  in  bulk  liquids.  Behind  the  remarkable  enhancement  of 
the  statistical  quality  from  the  previously  achieved  level,  these  models  can  provide 
structural  explanation  for  a  long  puzzled  simple  observation  regarding  boiling  points  of 
cyclic  compounds  vs.  their  chain  derivatives. 

It  was  believed  and  stated  in  standard  textbooks  that  a  heavier  hydrocarbon  boils 
at  higher  temperature  than  a  lighter  one  does.  However,  cyclohexane  with  formula  mass 
of  84  gram/mole  boils  at  353.9K,  while  n-hexane  has  formula  mass  of  86  gram/mole  and 
boils  at  12  degree  lower.  The  same  trend  follows  for  many  other  cyclic  compounds  and 
their  chain  derivatives.  Such  a  contradiction  to  the  general  statement  in  textbooks  can  be 
simply  explained  by  the  presented  boiling  points  models.  According  to  those  models,  the 
dispersion  and  cavity  formation  energy  in  bulk  liquids  is  not  only  related  to  the  mass  of 
the  molecule,  but  also  their  distribution  in  the  molecular  space,  i.e.  the  gravitation  index 
(masses  and  their  distribution),  not  mass  alone,  affects  boiling.  Thus,  cyclohexane  with 
higher  gravitation  index  than  n-hexane  would  boils  at  higher  temperature. 

Accordingly,  the  one-  and  two-parameter  regression  models  achieve  remarkable 
predictions  for  the  Ostwald  solubility  coefficients  of  hydrocarbon  gases/vapors  in  water 
and  hexadecane.  The  one-parameter  model  showed  that  the  bulk  cohesiveness  of  a  solute 
has  a  predominant  influence  on  the  solvation  of  hydrocarbon  gases/vapors  in  hexadecane. 
The  two-parameter  model  indicated  that  the  solvation  of  hydrocarbon  gases/vapors  in 
water  is  determined  by  the  bulk  cohesiveness  and  the  branching  ratio  of  the  solutes.  The 
five-parameter  theoretical  model  is  physically  justified  for  diverse  gases/vapors  and 
mainly  involves  hydrogen-bonding  and  polarizability  descriptors.  Besides  providing  a 
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useful  tool  for  the  prediction  of  solubilities,  the  five-parameter  model  suggests  different 
mechanisms  that  determine  the  solvation  of  polar  vs.  non-polar  solutes  in  water. 

The  potential  of  using  the  QSPR  technique  as  a  tool  for  seeking  the  physical 
meaning  and  intrinsic  dimensions  of  an  empirical  scale  is  demonstrated  by  the  successful 
investigation  of  the  nonspecific  solvent  polarity  scale  (S')-  The  revealed  dimensions  of 
S',  namely,  the  volume  weighted  total  dipole  moment  of  the  molecule  and  the  HOMO- 
LUMO  energy  gap,  not  only  assist  in  the  comprehension  of  the  unified  non-specific 
solvation,  but  also  provide  quantum-chemically  calculable  criteria  for  solvent  selection 
and  analysis. 

A  similar  approach  can  be  applied  to  seek  the  theoretical  representations  of  the 
solute  susceptibility  and  the  donor/acceptor  interactions  in  Drago's  unified  solvation 
model.  If  the  solute  susceptibility  and  donor/acceptor  interactions  can  be  represented  by 
unidimensional  independent  molecular  descriptors,  a  set  of  theoretical  parameters  that 
account  for  molecular  features  of  both  solvents  and  solutes  will  be  obtained. 
Consequently,  the  solute/solvent  based  properties  can  be  predicted  by  a  universal 
solvation  model  which  is  not  restricted  to  compounds  with  known  spectroscopical 
experimental  data  and  solutes  of  specific  types. 

Despite  the  diversity  and  complication  of  the  internal  structures  and 
intermolecular  interactions  in  disordered  condensed  media,  several  descriptors  presented 
in  this  dissertation  are  consistent  in  describing  certain  intermolecular  interactions.  The 
gravitation  index  effectively  describes  the  intermolecular  dispersion  forces  in  the  bulk 
solvent  media  and  in  non-polar  solutions.  Chapter  11  demonstrates  its  predominant 
influence  in  the  short-range  dispersion  and  repulsion  interactions  at  the  boiling  point  and 
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in  the  critical  state.  Chapter  III  further  shows  its  great  importance  in  the  solvation  of 
hydrocarbon  gases  and  vapors  in  water  and  hexadecane.  Chapter  III  also  reveals  that  the 
dispersion  and  cavity  formation  interactions  represented  by  the  gravitation  index  are  less 
apparent  in  a  polar  solution.  The  hydrogen-bonding  descriptors,  defined  as  different 
variances  of  surface  area-weighted  hydrogen  donor  partial  charges,  effectively  represent 
the  specific  hydrogen-bonding  effects  throughout  the  present  work.  The  volume- 
weighted  total  dipole  moment  of  a  molecule  describes  the  dipolarity  of  a  solvent.  The 
HOMO-LUMO  energy  gap  represents  the  polarizability  of  solvents.  The  most  negative 
partial  charges  reflect  the  electrostatic  cohesiveness  of  a  molecule. 

Most  importantly,  the  quantum-chemical  calculations  employed  in  QSPR  allow 
the  exploration  of  some  solvent  effects  that  are  otherwise  unexplorable  by  experimental 
techniques.  For  example,  although  it  is  impossible  to  analytically  calculate  the  exact 
change  of  permitivity  due  to  solute  solvent  interaction,  it  is  feasible  to  simulate  the 
appropriate  conformation  which  represents  the  adequate  interactions  between  solute  and 
solvent.  Such  considerations  have  the  potential  of  improve  the  electrostatistic  models  of 
Onsager  and  Kirkwood  and  the  solubility  parameter  of  Hildebrand  and  Hansen. 

Being  able  to  determine  and  represent  the  intrinsic  dimensions  of  an  empirical 
system  has  extended  our  ability  to  understand  and  to  develop  the  theory  for  empirical 
systems.  The  strategies  presented  here  can  be  applied  to  many  other  physical,  chemical 
and  biological  properties  such  as  melting  points,  toxicities,  sorption  coefficients,  etc. 
Moreover,  predicting  and  handling  the  features  that  control  a  not  yet  fully  developed 
system  become  possible.  For  example,  the  sorption  behavior  of  pesticides  in  a  soil/water 
system  is  extremely  complicated  and  thus  very  difficult  to  monitor.    If  molecular 
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descriptors  that  truly  represent  the  sorption  characteristics  can  be  found,  a  more 
comprehensive  theory  of  sorption  may  be  developed.  Moreover,  the  sorption  behavior  of 
various  pesticides  in  many  soil/water  systems  can  be  then  modeled  accordingly.  Thus, 
the  future  of  calculating  the  empirical  macroscopic  properties  and  scales  using  the 
molecular  first  principles  may  be  ultimately  based  on  the  development  of  QSPR. 

As  scientists,  today  we  are  increasingly  urged  to  do  science  of  relevance  to  society. 
QSPR  allows  us  to  carry  out  science  that  will  undoubtedly  help  us  to  make  the  production 
of  new  molecules  useful  in  all  facets  of  life  and  society  more  cheaply,  more  efficiently  and 
in  a  more  environmentally-friendly  manner.  At  the  same  time,  it  offers  the  highest 
intellectual  challenges  for  the  development  of  meaningful  relationships,  novel  theories  and 
more  profound  understanding  into  the  molecular  nature  of  the  world  in  which  we  live. 


APPENDIX  I 

THE  COMPREHENSIVE  DESCRIPTORS  IMPLEMENTED  IN  CODESSA 


The  descriptors  provided  by  CODESSA  calculations  include  the  numbers  of 
atoms  and  bonds,  molecular  weight,  the  gravitation  index  [96RRCsub],  Wiener  indices 
[47JACS17],  Randic  connectivity  indices  [75JACS66()9],  Kier  and  Hall  connectivity 
indices  [96MIh],  information  content  indices  [88RCR191],  molecular  volume,  shadow 
indices  [87ACA99],  and  numerous  quantum-chemical  indices  extracted  from  the  AMI  or 
PM3  output  [96RRCsub,  94AC1799,  94CT17,  96CR1()27].  The  CPSA  descriptors 
proposed  by  Jurs  et  al.  [90AC2323,  92JCICS3()6]  are  also  included  in  the  CODESSA 
program:  AMI  calculated  atomic  partial  charges  were  used  to  calculate  these  descriptors. 
The  quantum-chemical  descriptors  used  in  this  work  included  the  most  positive  and  the 
most  negative  Mulliken  net  atomic  charges,  frontier  molecular  orbital  (FMO)  energies, 
and  the  respective  Fukui  FMO  nucleophilic,  electrophilic  and  one-electron  reactivity 
indices.  The  total  dipole  moment  of  the  molecule,  dipole  moment  components,  and 
molecular  bond  orders  were  used  as  descriptors.  Additional,  more  specific,  descriptors 
of  this  type  included  the  valence  state  energies  of  atoms  and  total  Coulombic  and 
exchange  energies  between  atoms  in  the  molecule.  The  zero-point  energy,  the  calculated 
electronic  and  vibrational  transition  energies,  the  rotational,  vibrational,  translational, 
internal,  and  total  enthalpies,  entropies,  and  heat  capacities  were  also  used. 
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APPENDIX  II 

THE  TWO  PROCEDURES  BASED  ON  THE  LINEAR  REGRESSION 
TECHNIQUE  TO  FIND  THE  BEST  QSPR 
MODEL  OF  A  GIVEN  SIZE 

In  the  following  described  two  procedures,  a  preselection  of  descriptors  was 
implemented.  Descriptors  for  which  values  were  not  available  for  every  structure  in 
the  data  in  question  were  discarded.  Descriptors  having  a  constant  value  for  all 
structures  in  the  data  set  were  also  discarded. 

The  first  strategy  used  to  develop  physically  meaningful  multilinear  QSPR 
equations  from  the  very  large  pool  of  descriptors  is  a  combination  of  the  possible 
regressions  and  forward  selection  procedures  [66MI].  This  strategy  involved  the 
following  steps: 

(1)  All  orthogonal  pairs  of  descriptors  i  and  j  (with  pair  correlation  coefficient 
/?y  <  /?inin)  were  found  in  a  given  descriptor  pool.  The  chance  for  the  absolute 
orthogonality  of  two  descriptors  is  negligible  and  /^min  was  therefore  defined  as  a 
practical  limit  for  two  descriptors  being  approximately  orthogonal.  The  value  of  R^an 

=  0. 1  was  used  throughout  this  work. 

(2)  The  complete  set  of  two-parameter  regression  equations  utilizing  all  the 
orthogonal  pairs  of  descriptors,  obtained  in  step  1,  were  then  found  for  the  property 
studied.  The  Nq  (<  400)  significant  pairs  with  the  highest  multi-linear  regression 

correlation  coefficients  were  chosen  for  the  next  step. 
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(3)  For  each  significant  descriptor  pair  ij,  obtained  in  the  previous  step,  a  non- 
collinear  descriptor  scale,  k  (with  <  R^^^  and  /Jjg  <  /?nc)'  added,  and  the 
corresponding  three-parameter  regression  treatment  was  performed.  When  the  Fisher 
criterion  at  a  given  probability  level,  F,  was  smaller  than  that  for  the  best  two-parameter 
correlation,  the  latter  was  chosen  as  the  final  result  and  the  search  terminated,  otherwise 
the  Nq  (<  400)  descriptor  triples  with  the  highest  regression  correlation  coefficients  were 

considered  in  the  next  step.  The  non-collinearity  limit,  R^^^,  is  a  subjective  parameter  and 
has  to  be  of  a  value  higher  than  /?min  order  to  incorporate  a  more  substantial  part  of  the 
descriptor  space.  Different  R^q  values  were  tested  in  the  present  treatment  and  /?jjc  =  0.6 

was  chosen  as  leading  to  the  most  stable  correlations. 

(4)  For  each  significant  descriptor  set,  obtained  in  the  previous  step,  an  additional 
non-collinear  descriptor  scale  was  added,  and  the  appropriate  (n+l)-parameter  regression 
treatment  was  performed.  When  the  Fisher  criterion  at  the  given  probability  level,  F,  (or 
the  cross-validated  correlation  coefficient,  R^y)  obtained  for  any  of  these  correlations  was 

smaller  than  that  for  the  best  correlation  of  the  previous  rank,  the  latter  was  designated  as 
the  final  result  and  the  search  was  terminated.  Otherwise,  the      descriptor  sets  with  the 

highest  regression  correlation  coefficients  were  stored,  and  the  current  step  was  repeated 
with  the  number  of  parameters  again  increased  by  one  (n  =  n  +1). 

The  fmal  result  had,  therefore,  the  maximum  value  of  the  Fisher  criterion  and  the 
highest  value  of  the  cross-validated  correlation  coefficient.  According  to  these  statistical 
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criteria,  it  was  considered  the  best  representation  of  the  property  in  the  given  (large) 
descriptor  space. 

The  second  strategy  used  to  develop  the  best  multi-linear  QSPR  equations  is  based 
on  the  stepwise  regression  procedure  [66MI].  This  approach  uses  the  following 
procedure: 

(1)  To  reduce  the  number  of  descriptors  in  the  starting  set,  the  following  criteria 
were  applied  to  eliminate  descriptors  for  which:  (a)  the  F-value  for  the  one-parameter 
correlation  with  the  descriptor  was  below  1;  (b)  the  correlation  coefficient  for  the  one- 
parameter  equation  was  less  than  /?niin'  ^  user-defined  value  for  insignificant  correlations 
(^min  =  0-1  ^  present  work);  (c)  the  value  for  the  descriptor  in  the  one- 
parameter  correlation  was  less  than  t\  =  1.5;  (d)  the  descriptor  was  highly  intercorrelated 
with  another  descriptor  which  was  characterized  by  a  higher  single-parameter  correlation 
coefficient  value  for  a  given  property. 

(2)  All  two-parameter  regression  models  with  the  remaining  descriptors  were 
calculated  and  ranked  by  their  correlation  coefficients  R^.  The  best  two-parameter  models 
were  subjected  to  the  regression  procedure  of  step  3. 

(3)  Each  of  the  remaining  descriptors  not  significantly  correlated  (correlation 
coefficient  above  0.8)  [94MIg]  with  any  of  the  descriptors  akeady  in  the  model,  was 
added,  in  turn,  to  the  current  n-parameter  model,  and  the  resulting  f/7-t-7j-parameter 
models  were  tested.  The  best  10  (n+7)-parameter  models  were  again  submitted  to  the 
same  procedure. 
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(4)  When  the  optimum  number  of  parameters  in  a  model  was  reached,  the 
correlation  equations  with  highest  correlation  coefficients  and  with  the  highest  F-test 
values  were  selected. 
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